Hi to all, as discussed briefly with Fabian, for our products in Okkam we need a central repository of DataSources processed by Flink. With respect to existing external catalogs, such as Hive or Confluent's SchemaRegistry, whose objective is to provide necessary metadata to read/write the registered tables, we would also need a way to acess to other general metadata (e.g. name, description, creator, creation date, lastUpdate date, processedRecords, certificationLevel of provided data, provenance, language, etc).
This integration has 2 main goals: 1. In a UI: to enable the user to choose (or even create) a datasource to process with some task (e.g. quality assessment) and then see its metadata (name, description, creator user, etc) 2. During a Flink job: when 2 datasource gets joined and we have multiple values for an attribute (e.g. name or lastname) we can access the datasource metadata to decide which value to retain (e.g. the one coming from the most authoritative/certified source for that attribute) We also think that this could be of interest for projects like Apache Zeppelin or Nifi enabling them to suggest to the user the sources to start from. Do you think it makes sense to think about designing such a module for Flink? Best, Flavio