All: This may be off topic for Spark, but I'm sure several of you might have used some form of this as part of your BigData implementations. So, wanted to reach out.
As part of the Data Lake and Data Processing (by Spark as an example), we might end up different form-factors for the files (via, cleanup, enrichment etc). In order to make this data available for data exploration by analysts, data scientists - how to manage the metadata? - Creating Metadata Repository - Make the schemas available for users, so they may use it to create Hive tables, use them by Presto etc. Can you recommend some patterns (or tools) to help manage the Metadata? Trying to reduce the dependency on the engineers and make the analysts/scientists be self-sufficient as much as possible. Azure and AWS Glue Data Catalog seem to address this. Any inputs on these two? Appreciate in advance. Thanks, Vasu.