All:

This may be off topic for Spark, but I'm sure several of you might have
used some form of this as part of your BigData implementations. So, wanted
to reach out.

As part of the Data Lake and Data Processing (by Spark as an example), we
might end up different form-factors for the files (via, cleanup, enrichment
etc).

In order to make this data available for data exploration by analysts, data
scientists - how to manage the metadata?
  - Creating Metadata Repository
  - Make the schemas available for users, so they may use it to create Hive
tables, use them by Presto etc.

Can you recommend some patterns (or tools) to help manage the Metadata?
Trying to reduce the dependency on the engineers and make the
analysts/scientists be self-sufficient as much as possible.

Azure and AWS Glue Data Catalog seem to address this. Any inputs on these
two?

Appreciate in advance.

Thanks,
Vasu.

Reply via email to