Amazon

AWS Glue

  • AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.
  • AWS Glue was an attempt to create a new category of ETL product that automates large portions of the process. I collaborated with engineering leadership to arrive at an object model that would create a flexible foundation for innovation of both product features and user experience.
  • I built on this foundation to design and test the user experience.

Data Catalog

  • The Data Catalog is a persistent metadata store for all your data assets, regardless of where they are located.
  • Automatically computes statistics and registers partitions for efficient and cost-effective queries.
  • Maintains comprehensive schema history so you can understand how your data has changed over time.

Automatic Schema Discovery

  • Run crawlers on a schedule, on-demand, or trigger them based on an event to ensure that your metadata is up-to-date.
  • Crawlers connect to your source or target data store, progresses through a prioritized list of classifiers to determine the schema for your data, and then creates metadata in your AWS Glue Data Catalog.

Code Generation

  • Automatically generate code to extract, transform, and load your data.
  • Point AWS Glue to your data source and target, and Glue creates ETL scripts to transform, flatten, and enrich your data.
  • The code is generated in Python and written for the Apache Spark 2.1 environment.

Work

Ⓒ Intuitive