Hi Jark, Apologies for late reply. Thank you for your valuable input. Besides, I have a question about Glue Namespace. Could you share the > documentation of the Glue > Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue > Metaspace Mapping" section, > if there is a database "mydb" under namespace "ns1", is that mean the > database name in Flink is "ns1.mydb"?
There is no concept of namespace in glue data catalog. There are 3 levels in glue data catalog - catalog - database - table I have added the mapping in FLIP-277[1]. and updated it . it is directly database name from flink to database name in glue Please ignore the typo leftover in doc previously. Best, Samrat On Fri, Dec 9, 2022 at 8:38 PM Jark Wu <imj...@gmail.com> wrote: > Hi Samrat, > > Thanks a lot for driving the new catalog, and sorry for jumping into the > discussion late. > > As Flink SQL is becoming the first-class citizen of the Flink API, we are > planning to push Catalog > to become the first-class citizen of the connector instead of Source & > Sink. For Flink SQL users, > using Catalog is as natural and user-friendly as working with databases, > rather than having to define > DDL and schemas over and over again. This is also how Trino/Presto does. > > Regarding the repo for the Glue catalog, I think we can add it to > flink-connector-aws. We don't need > separate repos for Catalogs because Catalog is a kind of connector (others > are sources & sinks). > For example, MySqlCatalog[1] and PostgresCatalog[2] are in > flink-connector-jdbc, and HiveCatalog is > in flink-connector-hive. This can reduce repository maintenance, and I > think maybe some common > AWS utils can be shared there. cc @Danny Cranmer <dannycran...@apache.org > > > what do you think about this? > > Besides, I have a question about Glue Namespace. Could you share the > documentation of the Glue > Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue > Metaspace Mapping" section, > if there is a database "mydb" under namespace "ns1", is that mean the > database name in Flink is "ns1.mydb"? > > Best, > Jark > > > [1]: > > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/MySqlCatalog.java > [2]: > > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/PostgresCatalog.java > > On Fri, 9 Dec 2022 at 08:51, Dong Lin <lindon...@gmail.com> wrote: > > > Hi Samrat, > > > > Sorry for the late reply. Yeah I am referring to creating a similar > > external repo such as flink-catalog-glue. flink-connector-aws is already > > named with `connector` so it seems a bit weird to put a catalog there. > > > > Thanks! > > Dong > > > > On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <decordea...@gmail.com> wrote: > > > > > Hi Dong Lin, > > > > > > Since this is the first proposal for adding a vendor-specific catalog > > > > library in Flink, I think maybe we should also externalize those > > catalog > > > > libraries similar to how we are externalizing connector libraries. It > > is > > > > likely that we might want to add catalogs for other vectors in the > > > future. > > > > Externalizing those catalogs can make Flink development more scalable > > in > > > > the long term. > > > > > > Initially i mis-interpretted externalising the catalogs, There already > > > exists an externalised connector for aws [1]. > > > Are you referring to creating a similar external repo for catalogs or > > will > > > it be better to add it in flink-connector-aws[1] ? > > > > > > [1] https://github.com/apache/flink-connector-aws > > > > > > Samrat > > > > > > On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <decordea...@gmail.com> > wrote: > > > > > > > Hi Dong Lin, > > > > > > > > Aws Glue Data catalog is vendor specific and in future we will get > such > > > > type of implementation from different providers. We should > > > > definitely externalize these catalog libraries similar to flink > > > connectors. > > > > I am thinking of creating > > > > flink-catalog similar to flink-connector under the root (flink). glue > > > > catalog can be one of modules under the flink-catalog . Please > suggest > > if > > > > there is a better structure we can create for catalogs. > > > > > > > > > > > > It is mentioned in the FLIP that there will be two types of > > SdkHttpClient > > > >> supported based on the catalog option http-client.type. Is > > > >> http-client.type > > > >> a public config for the GlueCatalog? If yes, can we add this config > to > > > the > > > >> "Configurations" section and explain how users should choose the > > client > > > >> type? > > > > > > > > > > > > yes http-client.type is public config for the GlueCatalog. By default > > > > client-type will be `urlconnection` , if user don't specify any > > > connection > > > > type. > > > > I have updated the FLIP-277[1] #configuration section with all the > > > configs > > > > . Please review it again . > > > > > > > > [1] > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink > > > > > > > > Samrat > > > > > > > > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <decordea...@gmail.com> > > wrote: > > > > > > > >> Hi Yuxia, > > > >> > > > >> Thank you for reviewing the flip and putting forward your > observations > > > >> and comments. > > > >> > > > >> 1: I noticed there's a YAML part in the section of "Using the > > Catalog", > > > >>> what do you mean by that? Do you mean how to use glue catalog in > sql > > > >>> client? If so, just for your information, it's not supported to use > > > yaml > > > >>> envrioment file in sql client[2]. > > > >> > > > >> > > > >> Thank you for attaching the jira ticket [1] . I missed the changes. > > > >> There is a provision to register catalog directly through factory > > > resources > > > >> . > > > >> - GenericInMemoryCatalog is defined through > > > >> > > > > > > `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory` > > > >> - HiveCatalog is defined through > > > >> path > > > > > > `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory` > > > >> Similarly on the vendor specific module for Aws Glue we can define > it. > > > >> > > > >> 2: Seems there's a typo in "Design#views" part, it contains > > "listTables" > > > >>> which I think shouldn't be contained. > > > >> > > > >> > > > >> oh yes 😅 ! fixed it now thanks for pointing it out. > > > >> > > > >> > > > >> Also, I'm curious about how to list views using Glue API. Is there > an > > > >>> on-hand api to list views directly or we need to list the tables > and > > > then > > > >>> filter the views using the table-kind? > > > >> > > > >> > > > >> yes there is no in-hand api for list views directly , we need to > list > > > all > > > >> tables and then filter the views based on attribute tableKind which > > is a > > > >> part of table object in api response. > > > >> > > > >> > > > >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to > > String. > > > >>> It seems the char's size will lose, is it possible to have a better > > > mapping > > > >>> which won't loss the size of char type? > > > >> > > > >> > > > >> Thanks for pointing this out ! I have updated the flip with the > > correct > > > >> type. Initilially i mapped chartype , varchar type to string but > > > updated it > > > >> to directly map to the same type . > > > >> > > > >> > > > >> > > > >>> 4: About the "Flink CatalogFunction mapping with Glue Function" > part, > > > >>> how do we map the function language in Flink's CatalogFunction. > > > >> > > > >> > > > >> Glue Api (UserDefinedFunctionInput) doesn't support specific > attribute > > > >> for function language. Here is how aws hive compatible metastore is > > > mapping > > > >> hive function to glue function[2]. We will append a prefix of > Language > > > in > > > >> the function name itself indicating the language. I see this has > been > > > >> already done for the Hive Catalog [3]. We are thinking of > implementing > > > it > > > >> in the same way. > > > >> > > > >> [1] https://issues.apache.org/jira/browse/FLINK-22540 > > > >> [2] > > > >> > > > > > > https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83 > > > >> [3] > > > >> > > > > > > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415 > > > >> > > > >> Samrat > > > >> > > > >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <lindon...@gmail.com> > wrote: > > > >> > > > >>> Hi Samrat, > > > >>> > > > >>> Thanks for the FLIP! > > > >>> > > > >>> Since this is the first proposal for adding a vendor-specific > catalog > > > >>> library in Flink, I think maybe we should also externalize those > > > catalog > > > >>> libraries similar to how we are externalizing connector libraries. > It > > > is > > > >>> likely that we might want to add catalogs for other vectors in the > > > >>> future. > > > >>> Externalizing those catalogs can make Flink development more > scalable > > > in > > > >>> the long term. > > > >>> > > > >>> It is mentioned in the FLIP that there will be two types of > > > SdkHttpClient > > > >>> supported based on the catalog option http-client.type. Is > > > >>> http-client.type > > > >>> a public config for the GlueCatalog? If yes, can we add this config > > to > > > >>> the > > > >>> "Configurations" section and explain how users should choose the > > client > > > >>> type? > > > >>> > > > >>> Regards, > > > >>> Dong > > > >>> > > > >>> > > > >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <decordea...@gmail.com> > > > >>> wrote: > > > >>> > > > >>> > Hi everyone, > > > >>> > > > > >>> > I would like to open a discussion[1] on providing GlueCatalog > > support > > > >>> > in Flink. > > > >>> > Currently, Flink offers 3 major types of catalog[2]. Out of which > > > only > > > >>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We > > > would > > > >>> like > > > >>> > to introduce GlueCatalog in Flink offering another option for > users > > > >>> which > > > >>> > will be persistent in nature. Aws Glue data catalog is a > > centralized > > > >>> data > > > >>> > catalog in AWS cloud that provides integrations with many > different > > > >>> > connectors[3]. Flink GlueCatalog can use the features provided by > > > glue > > > >>> and > > > >>> > create strong integration with other services in the cloud. > > > >>> > > > > >>> > [1] > > > >>> > > > > >>> > > > > >>> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink > > > >>> > > > > >>> > [2] > > > >>> > > > > >>> > > > > >>> > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/ > > > >>> > > > > >>> > [3] > > > >>> > > > > >>> > > > > >>> > > > > > > https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro > > > >>> > > > > >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549 > > > >>> > > > > >>> > Bests > > > >>> > Samrat > > > >>> > > > > >>> > > > >> > > > > > >