Hi Konstantin Knauf, Can you explain how users are expected to authenticate with AWS Glue? I > don't see any catalog options regardng authx. So I assume the credentials > are taken from the environment?
We are planning to put GlueCatalog in flink-connector-aws[1]. flink-connector-aws already provides base and already built AwsConfigs[2]. These configs can be reused for the Catalog purpose also. I will update the FLIP-277[3] with the auth related configs in the Configuration Section. Users can pass these values as a part of config in catalog creation and if not provided it will try to fetch from the environment. This will allow users to create multiple catalog instances on the same session pointing to different accounts. ( I haven't tested multi account glue catalog instances during POC) . [1] https://github.com/apache/flink-connector-aws <https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java> [2] https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java [3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink Bests, Samrat On Mon, Dec 12, 2022 at 5:32 PM Samrat Deb <decordea...@gmail.com> wrote: > Hi Jark, > Apologies for late reply. > Thank you for your valuable input. > > Besides, I have a question about Glue Namespace. Could you share the >> documentation of the Glue >> Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue >> Metaspace Mapping" section, >> if there is a database "mydb" under namespace "ns1", is that mean the >> database name in Flink is "ns1.mydb"? > > There is no concept of namespace in glue data catalog. > There are 3 levels in glue data catalog > - catalog > - database > - table > > I have added the mapping in FLIP-277[1]. and updated it . > it is directly database name from flink to database name in glue > Please ignore the typo leftover in doc previously. > > Best, > Samrat > > > On Fri, Dec 9, 2022 at 8:38 PM Jark Wu <imj...@gmail.com> wrote: > >> Hi Samrat, >> >> Thanks a lot for driving the new catalog, and sorry for jumping into the >> discussion late. >> >> As Flink SQL is becoming the first-class citizen of the Flink API, we are >> planning to push Catalog >> to become the first-class citizen of the connector instead of Source & >> Sink. For Flink SQL users, >> using Catalog is as natural and user-friendly as working with databases, >> rather than having to define >> DDL and schemas over and over again. This is also how Trino/Presto does. >> >> Regarding the repo for the Glue catalog, I think we can add it to >> flink-connector-aws. We don't need >> separate repos for Catalogs because Catalog is a kind of connector (others >> are sources & sinks). >> For example, MySqlCatalog[1] and PostgresCatalog[2] are in >> flink-connector-jdbc, and HiveCatalog is >> in flink-connector-hive. This can reduce repository maintenance, and I >> think maybe some common >> AWS utils can be shared there. cc @Danny Cranmer < >> dannycran...@apache.org> >> what do you think about this? >> >> Besides, I have a question about Glue Namespace. Could you share the >> documentation of the Glue >> Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue >> Metaspace Mapping" section, >> if there is a database "mydb" under namespace "ns1", is that mean the >> database name in Flink is "ns1.mydb"? >> >> Best, >> Jark >> >> >> [1]: >> >> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/MySqlCatalog.java >> [2]: >> >> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/PostgresCatalog.java >> >> On Fri, 9 Dec 2022 at 08:51, Dong Lin <lindon...@gmail.com> wrote: >> >> > Hi Samrat, >> > >> > Sorry for the late reply. Yeah I am referring to creating a similar >> > external repo such as flink-catalog-glue. flink-connector-aws is already >> > named with `connector` so it seems a bit weird to put a catalog there. >> > >> > Thanks! >> > Dong >> > >> > On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <decordea...@gmail.com> >> wrote: >> > >> > > Hi Dong Lin, >> > > >> > > Since this is the first proposal for adding a vendor-specific catalog >> > > > library in Flink, I think maybe we should also externalize those >> > catalog >> > > > libraries similar to how we are externalizing connector libraries. >> It >> > is >> > > > likely that we might want to add catalogs for other vectors in the >> > > future. >> > > > Externalizing those catalogs can make Flink development more >> scalable >> > in >> > > > the long term. >> > > >> > > Initially i mis-interpretted externalising the catalogs, There already >> > > exists an externalised connector for aws [1]. >> > > Are you referring to creating a similar external repo for catalogs or >> > will >> > > it be better to add it in flink-connector-aws[1] ? >> > > >> > > [1] https://github.com/apache/flink-connector-aws >> > > >> > > Samrat >> > > >> > > On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <decordea...@gmail.com> >> wrote: >> > > >> > > > Hi Dong Lin, >> > > > >> > > > Aws Glue Data catalog is vendor specific and in future we will get >> such >> > > > type of implementation from different providers. We should >> > > > definitely externalize these catalog libraries similar to flink >> > > connectors. >> > > > I am thinking of creating >> > > > flink-catalog similar to flink-connector under the root (flink). >> glue >> > > > catalog can be one of modules under the flink-catalog . Please >> suggest >> > if >> > > > there is a better structure we can create for catalogs. >> > > > >> > > > >> > > > It is mentioned in the FLIP that there will be two types of >> > SdkHttpClient >> > > >> supported based on the catalog option http-client.type. Is >> > > >> http-client.type >> > > >> a public config for the GlueCatalog? If yes, can we add this >> config to >> > > the >> > > >> "Configurations" section and explain how users should choose the >> > client >> > > >> type? >> > > > >> > > > >> > > > yes http-client.type is public config for the GlueCatalog. By >> default >> > > > client-type will be `urlconnection` , if user don't specify any >> > > connection >> > > > type. >> > > > I have updated the FLIP-277[1] #configuration section with all the >> > > configs >> > > > . Please review it again . >> > > > >> > > > [1] >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink >> > > > >> > > > Samrat >> > > > >> > > > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <decordea...@gmail.com> >> > wrote: >> > > > >> > > >> Hi Yuxia, >> > > >> >> > > >> Thank you for reviewing the flip and putting forward your >> observations >> > > >> and comments. >> > > >> >> > > >> 1: I noticed there's a YAML part in the section of "Using the >> > Catalog", >> > > >>> what do you mean by that? Do you mean how to use glue catalog in >> sql >> > > >>> client? If so, just for your information, it's not supported to >> use >> > > yaml >> > > >>> envrioment file in sql client[2]. >> > > >> >> > > >> >> > > >> Thank you for attaching the jira ticket [1] . I missed the changes. >> > > >> There is a provision to register catalog directly through factory >> > > resources >> > > >> . >> > > >> - GenericInMemoryCatalog is defined through >> > > >> >> > > >> > >> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory` >> > > >> - HiveCatalog is defined through >> > > >> path >> > > >> > >> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory` >> > > >> Similarly on the vendor specific module for Aws Glue we can define >> it. >> > > >> >> > > >> 2: Seems there's a typo in "Design#views" part, it contains >> > "listTables" >> > > >>> which I think shouldn't be contained. >> > > >> >> > > >> >> > > >> oh yes 😅 ! fixed it now thanks for pointing it out. >> > > >> >> > > >> >> > > >> Also, I'm curious about how to list views using Glue API. Is there >> an >> > > >>> on-hand api to list views directly or we need to list the tables >> and >> > > then >> > > >>> filter the views using the table-kind? >> > > >> >> > > >> >> > > >> yes there is no in-hand api for list views directly , we need to >> list >> > > all >> > > >> tables and then filter the views based on attribute tableKind which >> > is a >> > > >> part of table object in api response. >> > > >> >> > > >> >> > > >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to >> > String. >> > > >>> It seems the char's size will lose, is it possible to have a >> better >> > > mapping >> > > >>> which won't loss the size of char type? >> > > >> >> > > >> >> > > >> Thanks for pointing this out ! I have updated the flip with the >> > correct >> > > >> type. Initilially i mapped chartype , varchar type to string but >> > > updated it >> > > >> to directly map to the same type . >> > > >> >> > > >> >> > > >> >> > > >>> 4: About the "Flink CatalogFunction mapping with Glue Function" >> part, >> > > >>> how do we map the function language in Flink's CatalogFunction. >> > > >> >> > > >> >> > > >> Glue Api (UserDefinedFunctionInput) doesn't support specific >> attribute >> > > >> for function language. Here is how aws hive compatible metastore is >> > > mapping >> > > >> hive function to glue function[2]. We will append a prefix of >> Language >> > > in >> > > >> the function name itself indicating the language. I see this has >> been >> > > >> already done for the Hive Catalog [3]. We are thinking of >> implementing >> > > it >> > > >> in the same way. >> > > >> >> > > >> [1] https://issues.apache.org/jira/browse/FLINK-22540 >> > > >> [2] >> > > >> >> > > >> > >> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83 >> > > >> [3] >> > > >> >> > > >> > >> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415 >> > > >> >> > > >> Samrat >> > > >> >> > > >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <lindon...@gmail.com> >> wrote: >> > > >> >> > > >>> Hi Samrat, >> > > >>> >> > > >>> Thanks for the FLIP! >> > > >>> >> > > >>> Since this is the first proposal for adding a vendor-specific >> catalog >> > > >>> library in Flink, I think maybe we should also externalize those >> > > catalog >> > > >>> libraries similar to how we are externalizing connector >> libraries. It >> > > is >> > > >>> likely that we might want to add catalogs for other vectors in the >> > > >>> future. >> > > >>> Externalizing those catalogs can make Flink development more >> scalable >> > > in >> > > >>> the long term. >> > > >>> >> > > >>> It is mentioned in the FLIP that there will be two types of >> > > SdkHttpClient >> > > >>> supported based on the catalog option http-client.type. Is >> > > >>> http-client.type >> > > >>> a public config for the GlueCatalog? If yes, can we add this >> config >> > to >> > > >>> the >> > > >>> "Configurations" section and explain how users should choose the >> > client >> > > >>> type? >> > > >>> >> > > >>> Regards, >> > > >>> Dong >> > > >>> >> > > >>> >> > > >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <decordea...@gmail.com >> > >> > > >>> wrote: >> > > >>> >> > > >>> > Hi everyone, >> > > >>> > >> > > >>> > I would like to open a discussion[1] on providing GlueCatalog >> > support >> > > >>> > in Flink. >> > > >>> > Currently, Flink offers 3 major types of catalog[2]. Out of >> which >> > > only >> > > >>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We >> > > would >> > > >>> like >> > > >>> > to introduce GlueCatalog in Flink offering another option for >> users >> > > >>> which >> > > >>> > will be persistent in nature. Aws Glue data catalog is a >> > centralized >> > > >>> data >> > > >>> > catalog in AWS cloud that provides integrations with many >> different >> > > >>> > connectors[3]. Flink GlueCatalog can use the features provided >> by >> > > glue >> > > >>> and >> > > >>> > create strong integration with other services in the cloud. >> > > >>> > >> > > >>> > [1] >> > > >>> > >> > > >>> > >> > > >>> >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink >> > > >>> > >> > > >>> > [2] >> > > >>> > >> > > >>> > >> > > >>> >> > > >> > >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/ >> > > >>> > >> > > >>> > [3] >> > > >>> > >> > > >>> > >> > > >>> >> > > >> > >> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro >> > > >>> > >> > > >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549 >> > > >>> > >> > > >>> > Bests >> > > >>> > Samrat >> > > >>> > >> > > >>> >> > > >> >> > > >> > >> >