Hi Dong Lin, Since this is the first proposal for adding a vendor-specific catalog > library in Flink, I think maybe we should also externalize those catalog > libraries similar to how we are externalizing connector libraries. It is > likely that we might want to add catalogs for other vectors in the future. > Externalizing those catalogs can make Flink development more scalable in > the long term.
Initially i mis-interpretted externalising the catalogs, There already exists an externalised connector for aws [1]. Are you referring to creating a similar external repo for catalogs or will it be better to add it in flink-connector-aws[1] ? [1] https://github.com/apache/flink-connector-aws Samrat On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <decordea...@gmail.com> wrote: > Hi Dong Lin, > > Aws Glue Data catalog is vendor specific and in future we will get such > type of implementation from different providers. We should > definitely externalize these catalog libraries similar to flink connectors. > I am thinking of creating > flink-catalog similar to flink-connector under the root (flink). glue > catalog can be one of modules under the flink-catalog . Please suggest if > there is a better structure we can create for catalogs. > > > It is mentioned in the FLIP that there will be two types of SdkHttpClient >> supported based on the catalog option http-client.type. Is >> http-client.type >> a public config for the GlueCatalog? If yes, can we add this config to the >> "Configurations" section and explain how users should choose the client >> type? > > > yes http-client.type is public config for the GlueCatalog. By default > client-type will be `urlconnection` , if user don't specify any connection > type. > I have updated the FLIP-277[1] #configuration section with all the configs > . Please review it again . > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink > > Samrat > > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <decordea...@gmail.com> wrote: > >> Hi Yuxia, >> >> Thank you for reviewing the flip and putting forward your observations >> and comments. >> >> 1: I noticed there's a YAML part in the section of "Using the Catalog", >>> what do you mean by that? Do you mean how to use glue catalog in sql >>> client? If so, just for your information, it's not supported to use yaml >>> envrioment file in sql client[2]. >> >> >> Thank you for attaching the jira ticket [1] . I missed the changes. >> There is a provision to register catalog directly through factory resources >> . >> - GenericInMemoryCatalog is defined through >> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory` >> - HiveCatalog is defined through >> path >> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory` >> Similarly on the vendor specific module for Aws Glue we can define it. >> >> 2: Seems there's a typo in "Design#views" part, it contains "listTables" >>> which I think shouldn't be contained. >> >> >> oh yes 😅 ! fixed it now thanks for pointing it out. >> >> >> Also, I'm curious about how to list views using Glue API. Is there an >>> on-hand api to list views directly or we need to list the tables and then >>> filter the views using the table-kind? >> >> >> yes there is no in-hand api for list views directly , we need to list all >> tables and then filter the views based on attribute tableKind which is a >> part of table object in api response. >> >> >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to String. >>> It seems the char's size will lose, is it possible to have a better mapping >>> which won't loss the size of char type? >> >> >> Thanks for pointing this out ! I have updated the flip with the correct >> type. Initilially i mapped chartype , varchar type to string but updated it >> to directly map to the same type . >> >> >> >>> 4: About the "Flink CatalogFunction mapping with Glue Function" part, >>> how do we map the function language in Flink's CatalogFunction. >> >> >> Glue Api (UserDefinedFunctionInput) doesn't support specific attribute >> for function language. Here is how aws hive compatible metastore is mapping >> hive function to glue function[2]. We will append a prefix of Language in >> the function name itself indicating the language. I see this has been >> already done for the Hive Catalog [3]. We are thinking of implementing it >> in the same way. >> >> [1] https://issues.apache.org/jira/browse/FLINK-22540 >> [2] >> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83 >> [3] >> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415 >> >> Samrat >> >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <lindon...@gmail.com> wrote: >> >>> Hi Samrat, >>> >>> Thanks for the FLIP! >>> >>> Since this is the first proposal for adding a vendor-specific catalog >>> library in Flink, I think maybe we should also externalize those catalog >>> libraries similar to how we are externalizing connector libraries. It is >>> likely that we might want to add catalogs for other vectors in the >>> future. >>> Externalizing those catalogs can make Flink development more scalable in >>> the long term. >>> >>> It is mentioned in the FLIP that there will be two types of SdkHttpClient >>> supported based on the catalog option http-client.type. Is >>> http-client.type >>> a public config for the GlueCatalog? If yes, can we add this config to >>> the >>> "Configurations" section and explain how users should choose the client >>> type? >>> >>> Regards, >>> Dong >>> >>> >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <decordea...@gmail.com> >>> wrote: >>> >>> > Hi everyone, >>> > >>> > I would like to open a discussion[1] on providing GlueCatalog support >>> > in Flink. >>> > Currently, Flink offers 3 major types of catalog[2]. Out of which only >>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We would >>> like >>> > to introduce GlueCatalog in Flink offering another option for users >>> which >>> > will be persistent in nature. Aws Glue data catalog is a centralized >>> data >>> > catalog in AWS cloud that provides integrations with many different >>> > connectors[3]. Flink GlueCatalog can use the features provided by glue >>> and >>> > create strong integration with other services in the cloud. >>> > >>> > [1] >>> > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink >>> > >>> > [2] >>> > >>> > >>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/ >>> > >>> > [3] >>> > >>> > >>> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro >>> > >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549 >>> > >>> > Bests >>> > Samrat >>> > >>> >>