Hi Dong Lin,

Since this is the first proposal for adding a vendor-specific catalog
> library in Flink, I think maybe we should also externalize those catalog
> libraries similar to how we are externalizing connector libraries. It is
> likely that we might want to add catalogs for other vectors in the future.
> Externalizing those catalogs can make Flink development more scalable in
> the long term.

Initially i mis-interpretted externalising the catalogs, There already
exists an externalised connector for aws [1].
Are you referring to creating a similar external repo for catalogs or will
it be better to add it in flink-connector-aws[1] ?

[1] https://github.com/apache/flink-connector-aws

Samrat

On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <decordea...@gmail.com> wrote:

> Hi Dong Lin,
>
> Aws Glue Data catalog is vendor specific and in future we will get such
> type of implementation from different providers. We should
> definitely externalize these catalog libraries similar to flink connectors.
> I am thinking of creating
> flink-catalog similar to flink-connector under the root (flink). glue
> catalog can be one of modules under the flink-catalog . Please suggest if
> there is a better structure we can create for catalogs.
>
>
> It is mentioned in the FLIP that there will be two types of SdkHttpClient
>> supported based on the catalog option http-client.type. Is
>> http-client.type
>> a public config for the GlueCatalog? If yes, can we add this config to the
>> "Configurations" section and explain how users should choose the client
>> type?
>
>
> yes http-client.type is public config for the GlueCatalog. By default
> client-type will be `urlconnection` , if user don't specify any connection
> type.
> I have updated the FLIP-277[1] #configuration section with all the configs
> . Please review it again .
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>
> Samrat
>
> On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <decordea...@gmail.com> wrote:
>
>> Hi Yuxia,
>>
>> Thank you for reviewing the flip and putting forward your observations
>> and comments.
>>
>> 1: I noticed there's a YAML part in the section of "Using the Catalog",
>>> what do you mean by that? Do you mean how to use glue catalog in sql
>>> client? If so, just for your information, it's not supported to use yaml
>>> envrioment file in sql client[2].
>>
>>
>> Thank you for attaching the jira ticket [1] . I missed the changes.
>> There is a provision to register catalog directly through factory resources
>> .
>> - GenericInMemoryCatalog is defined through
>> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>> - HiveCatalog is defined through
>> path  
>> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>> Similarly on the vendor specific module for Aws Glue we can define it.
>>
>> 2: Seems there's a typo in "Design#views" part, it contains "listTables"
>>> which I think shouldn't be contained.
>>
>>
>> oh yes 😅 ! fixed it now thanks for pointing it out.
>>
>>
>> Also, I'm curious about how to list views using Glue API. Is there an
>>> on-hand api to list views directly or we need to list the tables and then
>>> filter the views using the table-kind?
>>
>>
>> yes there is no in-hand api for list views directly , we need to list all
>> tables and then filter the views based on attribute tableKind which is a
>> part of table object in api response.
>>
>>
>> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to String.
>>> It seems the char's size will lose, is it possible to have a better mapping
>>> which won't loss the size of char type?
>>
>>
>> Thanks for pointing this out ! I have updated the flip with the correct
>> type. Initilially i mapped chartype , varchar type to string but updated it
>> to directly map to the same type .
>>
>>
>>
>>> 4: About the "Flink CatalogFunction mapping with Glue Function" part,
>>> how do we map the function language in Flink's CatalogFunction.
>>
>>
>> Glue Api (UserDefinedFunctionInput) doesn't support specific attribute
>> for function language. Here is how aws hive compatible metastore is mapping
>> hive function to glue function[2]. We will append a prefix of Language in
>> the function name itself indicating the language. I see this has been
>> already done for the Hive Catalog [3]. We are thinking of implementing it
>> in the same way.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-22540
>> [2]
>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
>> [3]
>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
>>
>> Samrat
>>
>> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <lindon...@gmail.com> wrote:
>>
>>> Hi Samrat,
>>>
>>> Thanks for the FLIP!
>>>
>>> Since this is the first proposal for adding a vendor-specific catalog
>>> library in Flink, I think maybe we should also externalize those catalog
>>> libraries similar to how we are externalizing connector libraries. It is
>>> likely that we might want to add catalogs for other vectors in the
>>> future.
>>> Externalizing those catalogs can make Flink development more scalable in
>>> the long term.
>>>
>>> It is mentioned in the FLIP that there will be two types of SdkHttpClient
>>> supported based on the catalog option http-client.type. Is
>>> http-client.type
>>> a public config for the GlueCatalog? If yes, can we add this config to
>>> the
>>> "Configurations" section and explain how users should choose the client
>>> type?
>>>
>>> Regards,
>>> Dong
>>>
>>>
>>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <decordea...@gmail.com>
>>> wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > I would like to open a discussion[1] on providing GlueCatalog support
>>> > in Flink.
>>> > Currently, Flink offers 3 major types of catalog[2]. Out of which only
>>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We would
>>> like
>>> > to introduce GlueCatalog in Flink offering another option for users
>>> which
>>> > will be persistent in nature. Aws Glue data catalog is a centralized
>>> data
>>> > catalog in AWS cloud that provides integrations with many different
>>> > connectors[3]. Flink GlueCatalog can use the features provided by glue
>>> and
>>> > create strong integration with other services in the cloud.
>>> >
>>> > [1]
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>> >
>>> > [2]
>>> >
>>> >
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
>>> >
>>> > [3]
>>> >
>>> >
>>> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
>>> >
>>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
>>> >
>>> > Bests
>>> > Samrat
>>> >
>>>
>>

Reply via email to