Data Systems Update #2: Dec 3 at AWS re:Invent

Keeping up the flood of AWS re:Invent announcements is always a challenge. This week I’ll be posting updates to save you time and keep you in the know about data and AI systems.

Databases

Serverless Distributed PostgreSQL

Aurora DSQL is a new database that provides PostgreSQL compatibility with a cloud-native distributed foundation, providing practically unlimited scalability and active-active multi-region capabilities. This system is positioned in the same class as Google Spanner or CockroachDB. Read the press release

Generative AI for Schema Conversion

When migrating workloads to the cloud companies often choose to switch from proprietary databases to open source databases such as PostgreSQL. The Database Migration Service helps accelerate that process. It’s a natural fit for generative AI, which is well established in other translation applications. Read the blog.

File Systems and Object Storage

S3 Tables Store Apache Iceberg Data

This capability starts to blur the distinction between object storage and databases. S3 now has built-in support for structured data in the Apache Iceberg format. Storing Iceberg data in S3 is already popular, and this change improves performance and simplifies management by handling compactions and snapshots automatically. Read the blog.

Queryable Metadata in S3

Building upon S3 Tables, queryable metadata brings more database-like capabilities to S3. By querying for objects based on metadata, you can retrieve data stored in S3 without knowing the name of the object that contains it. S3 Metadata is now available in preview. Read the blog.

Analytics & AI

SageMaker Unified Studio

SageMaker is getting a major upgrade that brings together many of the workflows involved in data analytics or AI model buildng. SageMaker is already established as the tool of choice for data science and related tasks. Now in preview, the Unified Studio experience creates a central point of access for all AWS data sources and data processing tools. Read the blog.

SageMaker Lakehouse

SageMaker Lakehouse provides a unified access to multiple data systems. It interfaces directly with AWS Redshift data warehouses and S3 objects and tables. Zero-ETL integrations allow you to analyze data from operational databases AWS RDS and RDS Aurora in near-real-time. SageMaker eliminates the need for extraction transformation and loading (ETL), the cumbersome and often slow process traditionally used to move data between systems. Read the blog.

Zero-ETL to SageMaker Lakehouse from Leading SaaS Applications

Many important company data sets live in third-party SaaS applications, not in their AWS databases. Third-party SaaS data sets from SaaS like Salesforce, SAP, Zendesk become first-class citizens of the SageMaker ecosystem with the addition of zero-ETL integration. Read the blog.

Stay tuned for more updates as AWS re:Invent continues!

Data Systems Update 2 - re:Invent