Hi All ,

Thank you for all your valuable suggestions and questions regarding the
proposals.

In case there are more queries or questions from the community , I will
keep this discussion Thread open for a couple of more days and proceed with
next steps.

Bests
Samrat

On Wed, Dec 14, 2022 at 9:41 PM Samrat Deb <decordea...@gmail.com> wrote:

>
>
> Thank you Danny for more insights on the flink-connector-aws-base[1].
>
> It looks like localstack supports glue [2], we already use localstack for
>> integration tests so we can follow suite here.
>
>
> As GlueCatalog will be a part of flink-connector-aws-base. As per
> suggestion, we will reuse code and resources as much as possible and add
> extra things required in extensible manner.
>
> Bests,
> Samrat
>
>
> [1]
> https://github.com/apache/flink-connector-aws/tree/main/flink-connector-aws-base
> [2] https://docs.localstack.cloud/user-guide/aws/glue/
>
>
>
>
> On Tue, Dec 13, 2022 at 9:32 PM Danny Cranmer <dannycran...@apache.org>
> wrote:
>
>> Hello Samrat,
>>
>> Sorry for the late response.
>>
>> +1 for a native Glue Data Catalog integration. We have
>> internally developed a Glue Data Catalog catalog implementation that shims
>> hive. We have been meaning to contribute, but this solution can replace our
>> internal one.
>>
>> +1 for putting this in the flink-connector-aws. With regards to
>> configuration, we have a flink-connector-aws-base [1] module where all the
>> common configurations should go. Anything common, such as authentication
>> providers, please use. Additionally any new configurations you need to add
>> please consider them going into aws-base if they might be reusable for
>> other AWS integrations.
>>
>> > We will create an e2e integration test cases capturing all the
>> implementation in a mock environment.
>>
>> It looks like localstack supports glue [2], we already use localstack for
>> integration tests so we can follow suite here.
>>
>> Thanks,
>> Danny
>>
>> [1]
>> https://github.com/apache/flink-connector-aws/tree/main/flink-connector-aws-base
>> [2] https://docs.localstack.cloud/user-guide/aws/glue/
>>
>> On Mon, Dec 12, 2022 at 12:18 PM Samrat Deb <decordea...@gmail.com>
>> wrote:
>>
>>> Hi Konstantin Knauf,
>>>
>>> Can you explain how users are expected to authenticate with AWS Glue? I
>>>> don't see any catalog options regardng authx. So I assume the
>>>> credentials
>>>> are taken from the environment?
>>>
>>>
>>> We are planning to put GlueCatalog in flink-connector-aws[1].
>>> flink-connector-aws already provides base and already built AwsConfigs[2].
>>> These configs can be reused for the Catalog purpose also.
>>> I will update the FLIP-277[3] with the auth related configs in the
>>> Configuration Section.
>>>
>>> Users can pass these values as a part of config in catalog creation and
>>> if not provided it will try to fetch from the environment.
>>> This will allow users to create multiple catalog instances on the same
>>> session pointing to different accounts. ( I haven't tested multi
>>> account glue catalog instances during POC) .
>>>
>>> [1] https://github.com/apache/flink-connector-aws
>>> <https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java>
>>> [2]
>>> https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java
>>> [3]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>>
>>> Bests,
>>> Samrat
>>>
>>> On Mon, Dec 12, 2022 at 5:32 PM Samrat Deb <decordea...@gmail.com>
>>> wrote:
>>>
>>>> Hi Jark,
>>>> Apologies for late reply.
>>>> Thank you for your valuable input.
>>>>
>>>> Besides, I have a question about Glue Namespace. Could you share the
>>>>> documentation of the Glue
>>>>>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
>>>>> Metaspace Mapping" section,
>>>>> if there is a database "mydb" under namespace "ns1", is that mean the
>>>>> database name in Flink is "ns1.mydb"?
>>>>
>>>> There is no concept of namespace in glue data catalog.
>>>> There are 3 levels in glue data catalog
>>>> - catalog
>>>> - database
>>>> - table
>>>>
>>>> I have added the mapping in FLIP-277[1]. and updated it .
>>>> it is directly database name from flink to database name in glue
>>>> Please ignore the typo leftover in doc previously.
>>>>
>>>> Best,
>>>> Samrat
>>>>
>>>>
>>>> On Fri, Dec 9, 2022 at 8:38 PM Jark Wu <imj...@gmail.com> wrote:
>>>>
>>>>> Hi Samrat,
>>>>>
>>>>> Thanks a lot for driving the new catalog, and sorry for jumping into
>>>>> the
>>>>> discussion late.
>>>>>
>>>>> As Flink SQL is becoming the first-class citizen of the Flink API, we
>>>>> are
>>>>> planning to push Catalog
>>>>> to become the first-class citizen of the connector instead of Source &
>>>>> Sink. For Flink SQL users,
>>>>> using Catalog is as natural and user-friendly as working with
>>>>> databases,
>>>>> rather than having to define
>>>>> DDL and schemas over and over again. This is also how Trino/Presto
>>>>> does.
>>>>>
>>>>> Regarding the repo for the Glue catalog, I think we can add it to
>>>>> flink-connector-aws. We don't need
>>>>> separate repos for Catalogs because Catalog is a kind of connector
>>>>> (others
>>>>> are sources & sinks).
>>>>> For example, MySqlCatalog[1] and PostgresCatalog[2] are in
>>>>> flink-connector-jdbc, and HiveCatalog is
>>>>> in flink-connector-hive. This can reduce repository maintenance, and I
>>>>> think maybe some common
>>>>> AWS utils can be shared there.  cc @Danny Cranmer <
>>>>> dannycran...@apache.org>
>>>>> what do you think about this?
>>>>>
>>>>> Besides, I have a question about Glue Namespace. Could you share the
>>>>> documentation of the Glue
>>>>>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
>>>>> Metaspace Mapping" section,
>>>>> if there is a database "mydb" under namespace "ns1", is that mean the
>>>>> database name in Flink is "ns1.mydb"?
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>>
>>>>> [1]:
>>>>>
>>>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/MySqlCatalog.java
>>>>> [2]:
>>>>>
>>>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/PostgresCatalog.java
>>>>>
>>>>> On Fri, 9 Dec 2022 at 08:51, Dong Lin <lindon...@gmail.com> wrote:
>>>>>
>>>>> > Hi Samrat,
>>>>> >
>>>>> > Sorry for the late reply. Yeah I am referring to creating a similar
>>>>> > external repo such as flink-catalog-glue. flink-connector-aws is
>>>>> already
>>>>> > named with `connector` so it seems a bit weird to put a catalog
>>>>> there.
>>>>> >
>>>>> > Thanks!
>>>>> > Dong
>>>>> >
>>>>> > On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <decordea...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > > Hi Dong Lin,
>>>>> > >
>>>>> > > Since this is the first proposal for adding a vendor-specific
>>>>> catalog
>>>>> > > > library in Flink, I think maybe we should also externalize those
>>>>> > catalog
>>>>> > > > libraries similar to how we are externalizing connector
>>>>> libraries. It
>>>>> > is
>>>>> > > > likely that we might want to add catalogs for other vectors in
>>>>> the
>>>>> > > future.
>>>>> > > > Externalizing those catalogs can make Flink development more
>>>>> scalable
>>>>> > in
>>>>> > > > the long term.
>>>>> > >
>>>>> > > Initially i mis-interpretted externalising the catalogs, There
>>>>> already
>>>>> > > exists an externalised connector for aws [1].
>>>>> > > Are you referring to creating a similar external repo for catalogs
>>>>> or
>>>>> > will
>>>>> > > it be better to add it in flink-connector-aws[1] ?
>>>>> > >
>>>>> > > [1] https://github.com/apache/flink-connector-aws
>>>>> > >
>>>>> > > Samrat
>>>>> > >
>>>>> > > On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <decordea...@gmail.com>
>>>>> wrote:
>>>>> > >
>>>>> > > > Hi Dong Lin,
>>>>> > > >
>>>>> > > > Aws Glue Data catalog is vendor specific and in future we will
>>>>> get such
>>>>> > > > type of implementation from different providers. We should
>>>>> > > > definitely externalize these catalog libraries similar to flink
>>>>> > > connectors.
>>>>> > > > I am thinking of creating
>>>>> > > > flink-catalog similar to flink-connector under the root (flink).
>>>>> glue
>>>>> > > > catalog can be one of modules under the flink-catalog . Please
>>>>> suggest
>>>>> > if
>>>>> > > > there is a better structure we can create for catalogs.
>>>>> > > >
>>>>> > > >
>>>>> > > > It is mentioned in the FLIP that there will be two types of
>>>>> > SdkHttpClient
>>>>> > > >> supported based on the catalog option http-client.type. Is
>>>>> > > >> http-client.type
>>>>> > > >> a public config for the GlueCatalog? If yes, can we add this
>>>>> config to
>>>>> > > the
>>>>> > > >> "Configurations" section and explain how users should choose the
>>>>> > client
>>>>> > > >> type?
>>>>> > > >
>>>>> > > >
>>>>> > > > yes http-client.type is public config for the GlueCatalog. By
>>>>> default
>>>>> > > > client-type will be `urlconnection` , if user don't specify any
>>>>> > > connection
>>>>> > > > type.
>>>>> > > > I have updated the FLIP-277[1] #configuration section with all
>>>>> the
>>>>> > > configs
>>>>> > > > . Please review it again .
>>>>> > > >
>>>>> > > > [1]
>>>>> > > >
>>>>> > >
>>>>> >
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>>>> > > >
>>>>> > > > Samrat
>>>>> > > >
>>>>> > > > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <decordea...@gmail.com
>>>>> >
>>>>> > wrote:
>>>>> > > >
>>>>> > > >> Hi Yuxia,
>>>>> > > >>
>>>>> > > >> Thank you for reviewing the flip and putting forward your
>>>>> observations
>>>>> > > >> and comments.
>>>>> > > >>
>>>>> > > >> 1: I noticed there's a YAML part in the section of "Using the
>>>>> > Catalog",
>>>>> > > >>> what do you mean by that? Do you mean how to use glue catalog
>>>>> in sql
>>>>> > > >>> client? If so, just for your information, it's not supported
>>>>> to use
>>>>> > > yaml
>>>>> > > >>> envrioment file in sql client[2].
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> Thank you for attaching the jira ticket [1] . I missed the
>>>>> changes.
>>>>> > > >> There is a provision to register catalog directly through
>>>>> factory
>>>>> > > resources
>>>>> > > >> .
>>>>> > > >> - GenericInMemoryCatalog is defined through
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>>>>> > > >> - HiveCatalog is defined through
>>>>> > > >> path
>>>>> > >
>>>>> >
>>>>> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
>>>>> > > >> Similarly on the vendor specific module for Aws Glue we can
>>>>> define it.
>>>>> > > >>
>>>>> > > >> 2: Seems there's a typo in "Design#views" part, it contains
>>>>> > "listTables"
>>>>> > > >>> which I think shouldn't be contained.
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> oh yes 😅 ! fixed it now thanks for pointing it out.
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> Also, I'm curious about how to list views using Glue API. Is
>>>>> there an
>>>>> > > >>> on-hand api to list views directly or we need to list the
>>>>> tables and
>>>>> > > then
>>>>> > > >>> filter the views using the table-kind?
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> yes there is no in-hand api for list views directly , we need
>>>>> to list
>>>>> > > all
>>>>> > > >> tables and then filter the views based on attribute tableKind
>>>>> which
>>>>> > is a
>>>>> > > >> part of table object in api response.
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to
>>>>> > String.
>>>>> > > >>> It seems the char's size will lose, is it possible to have a
>>>>> better
>>>>> > > mapping
>>>>> > > >>> which won't loss the size of char type?
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> Thanks for pointing this out ! I have updated the flip with the
>>>>> > correct
>>>>> > > >> type. Initilially i mapped chartype , varchar type to string but
>>>>> > > updated it
>>>>> > > >> to directly map to the same type .
>>>>> > > >>
>>>>> > > >>
>>>>> > > >>
>>>>> > > >>> 4: About the "Flink CatalogFunction mapping with Glue
>>>>> Function" part,
>>>>> > > >>> how do we map the function language in Flink's CatalogFunction.
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> Glue Api (UserDefinedFunctionInput) doesn't support specific
>>>>> attribute
>>>>> > > >> for function language. Here is how aws hive compatible
>>>>> metastore is
>>>>> > > mapping
>>>>> > > >> hive function to glue function[2]. We will append a prefix of
>>>>> Language
>>>>> > > in
>>>>> > > >> the function name itself indicating the language. I see this
>>>>> has been
>>>>> > > >> already done for the Hive Catalog [3]. We are thinking of
>>>>> implementing
>>>>> > > it
>>>>> > > >> in the same way.
>>>>> > > >>
>>>>> > > >> [1] https://issues.apache.org/jira/browse/FLINK-22540
>>>>> > > >> [2]
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
>>>>> > > >> [3]
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
>>>>> > > >>
>>>>> > > >> Samrat
>>>>> > > >>
>>>>> > > >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <lindon...@gmail.com>
>>>>> wrote:
>>>>> > > >>
>>>>> > > >>> Hi Samrat,
>>>>> > > >>>
>>>>> > > >>> Thanks for the FLIP!
>>>>> > > >>>
>>>>> > > >>> Since this is the first proposal for adding a vendor-specific
>>>>> catalog
>>>>> > > >>> library in Flink, I think maybe we should also externalize
>>>>> those
>>>>> > > catalog
>>>>> > > >>> libraries similar to how we are externalizing connector
>>>>> libraries. It
>>>>> > > is
>>>>> > > >>> likely that we might want to add catalogs for other vectors in
>>>>> the
>>>>> > > >>> future.
>>>>> > > >>> Externalizing those catalogs can make Flink development more
>>>>> scalable
>>>>> > > in
>>>>> > > >>> the long term.
>>>>> > > >>>
>>>>> > > >>> It is mentioned in the FLIP that there will be two types of
>>>>> > > SdkHttpClient
>>>>> > > >>> supported based on the catalog option http-client.type. Is
>>>>> > > >>> http-client.type
>>>>> > > >>> a public config for the GlueCatalog? If yes, can we add this
>>>>> config
>>>>> > to
>>>>> > > >>> the
>>>>> > > >>> "Configurations" section and explain how users should choose
>>>>> the
>>>>> > client
>>>>> > > >>> type?
>>>>> > > >>>
>>>>> > > >>> Regards,
>>>>> > > >>> Dong
>>>>> > > >>>
>>>>> > > >>>
>>>>> > > >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <
>>>>> decordea...@gmail.com>
>>>>> > > >>> wrote:
>>>>> > > >>>
>>>>> > > >>> > Hi everyone,
>>>>> > > >>> >
>>>>> > > >>> > I would like to open a discussion[1] on providing GlueCatalog
>>>>> > support
>>>>> > > >>> > in Flink.
>>>>> > > >>> > Currently, Flink offers 3 major types of catalog[2]. Out of
>>>>> which
>>>>> > > only
>>>>> > > >>> > HiveCatalog is a persistent catalog backed by Hive
>>>>> Metastore. We
>>>>> > > would
>>>>> > > >>> like
>>>>> > > >>> > to introduce GlueCatalog in Flink offering another option
>>>>> for users
>>>>> > > >>> which
>>>>> > > >>> > will be persistent in nature. Aws Glue data catalog is a
>>>>> > centralized
>>>>> > > >>> data
>>>>> > > >>> > catalog in AWS cloud that provides integrations with many
>>>>> different
>>>>> > > >>> > connectors[3]. Flink GlueCatalog can use the features
>>>>> provided by
>>>>> > > glue
>>>>> > > >>> and
>>>>> > > >>> > create strong integration with other services in the cloud.
>>>>> > > >>> >
>>>>> > > >>> > [1]
>>>>> > > >>> >
>>>>> > > >>> >
>>>>> > > >>>
>>>>> > >
>>>>> >
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>>>> > > >>> >
>>>>> > > >>> > [2]
>>>>> > > >>> >
>>>>> > > >>> >
>>>>> > > >>>
>>>>> > >
>>>>> >
>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
>>>>> > > >>> >
>>>>> > > >>> > [3]
>>>>> > > >>> >
>>>>> > > >>> >
>>>>> > > >>>
>>>>> > >
>>>>> >
>>>>> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
>>>>> > > >>> >
>>>>> > > >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
>>>>> > > >>> >
>>>>> > > >>> > Bests
>>>>> > > >>> > Samrat
>>>>> > > >>> >
>>>>> > > >>>
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>>
>>>>

Reply via email to