Hi Samrat,

+1 to the effort and +1 to adding it to flink-connector-aws.

Can you explain how users are expected to authenticate with AWS Glue? I
don't see any catalog options regardng authx. So I assume the credentials
are taken from the environment?

Best,

Konstantin



Am Fr., 9. Dez. 2022 um 16:08 Uhr schrieb Jark Wu <imj...@gmail.com>:

> Hi Samrat,
>
> Thanks a lot for driving the new catalog, and sorry for jumping into the
> discussion late.
>
> As Flink SQL is becoming the first-class citizen of the Flink API, we are
> planning to push Catalog
> to become the first-class citizen of the connector instead of Source &
> Sink. For Flink SQL users,
> using Catalog is as natural and user-friendly as working with databases,
> rather than having to define
> DDL and schemas over and over again. This is also how Trino/Presto does.
>
> Regarding the repo for the Glue catalog, I think we can add it to
> flink-connector-aws. We don't need
> separate repos for Catalogs because Catalog is a kind of connector (others
> are sources & sinks).
> For example, MySqlCatalog[1] and PostgresCatalog[2] are in
> flink-connector-jdbc, and HiveCatalog is
> in flink-connector-hive. This can reduce repository maintenance, and I
> think maybe some common
> AWS utils can be shared there.  cc @Danny Cranmer <dannycran...@apache.org
> >
> what do you think about this?
>
> Besides, I have a question about Glue Namespace. Could you share the
> documentation of the Glue
>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
> Metaspace Mapping" section,
> if there is a database "mydb" under namespace "ns1", is that mean the
> database name in Flink is "ns1.mydb"?
>
> Best,
> Jark
>
>
> [1]:
>
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/MySqlCatalog.java
> [2]:
>
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/catalog/PostgresCatalog.java
>
> On Fri, 9 Dec 2022 at 08:51, Dong Lin <lindon...@gmail.com> wrote:
>
> > Hi Samrat,
> >
> > Sorry for the late reply. Yeah I am referring to creating a similar
> > external repo such as flink-catalog-glue. flink-connector-aws is already
> > named with `connector` so it seems a bit weird to put a catalog there.
> >
> > Thanks!
> > Dong
> >
> > On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <decordea...@gmail.com> wrote:
> >
> > > Hi Dong Lin,
> > >
> > > Since this is the first proposal for adding a vendor-specific catalog
> > > > library in Flink, I think maybe we should also externalize those
> > catalog
> > > > libraries similar to how we are externalizing connector libraries. It
> > is
> > > > likely that we might want to add catalogs for other vectors in the
> > > future.
> > > > Externalizing those catalogs can make Flink development more scalable
> > in
> > > > the long term.
> > >
> > > Initially i mis-interpretted externalising the catalogs, There already
> > > exists an externalised connector for aws [1].
> > > Are you referring to creating a similar external repo for catalogs or
> > will
> > > it be better to add it in flink-connector-aws[1] ?
> > >
> > > [1] https://github.com/apache/flink-connector-aws
> > >
> > > Samrat
> > >
> > > On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <decordea...@gmail.com>
> wrote:
> > >
> > > > Hi Dong Lin,
> > > >
> > > > Aws Glue Data catalog is vendor specific and in future we will get
> such
> > > > type of implementation from different providers. We should
> > > > definitely externalize these catalog libraries similar to flink
> > > connectors.
> > > > I am thinking of creating
> > > > flink-catalog similar to flink-connector under the root (flink). glue
> > > > catalog can be one of modules under the flink-catalog . Please
> suggest
> > if
> > > > there is a better structure we can create for catalogs.
> > > >
> > > >
> > > > It is mentioned in the FLIP that there will be two types of
> > SdkHttpClient
> > > >> supported based on the catalog option http-client.type. Is
> > > >> http-client.type
> > > >> a public config for the GlueCatalog? If yes, can we add this config
> to
> > > the
> > > >> "Configurations" section and explain how users should choose the
> > client
> > > >> type?
> > > >
> > > >
> > > > yes http-client.type is public config for the GlueCatalog. By default
> > > > client-type will be `urlconnection` , if user don't specify any
> > > connection
> > > > type.
> > > > I have updated the FLIP-277[1] #configuration section with all the
> > > configs
> > > > . Please review it again .
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> > > >
> > > > Samrat
> > > >
> > > > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <decordea...@gmail.com>
> > wrote:
> > > >
> > > >> Hi Yuxia,
> > > >>
> > > >> Thank you for reviewing the flip and putting forward your
> observations
> > > >> and comments.
> > > >>
> > > >> 1: I noticed there's a YAML part in the section of "Using the
> > Catalog",
> > > >>> what do you mean by that? Do you mean how to use glue catalog in
> sql
> > > >>> client? If so, just for your information, it's not supported to use
> > > yaml
> > > >>> envrioment file in sql client[2].
> > > >>
> > > >>
> > > >> Thank you for attaching the jira ticket [1] . I missed the changes.
> > > >> There is a provision to register catalog directly through factory
> > > resources
> > > >> .
> > > >> - GenericInMemoryCatalog is defined through
> > > >>
> > >
> >
> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> > > >> - HiveCatalog is defined through
> > > >> path
> > >
> >
> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> > > >> Similarly on the vendor specific module for Aws Glue we can define
> it.
> > > >>
> > > >> 2: Seems there's a typo in "Design#views" part, it contains
> > "listTables"
> > > >>> which I think shouldn't be contained.
> > > >>
> > > >>
> > > >> oh yes 😅 ! fixed it now thanks for pointing it out.
> > > >>
> > > >>
> > > >> Also, I'm curious about how to list views using Glue API. Is there
> an
> > > >>> on-hand api to list views directly or we need to list the tables
> and
> > > then
> > > >>> filter the views using the table-kind?
> > > >>
> > > >>
> > > >> yes there is no in-hand api for list views directly , we need to
> list
> > > all
> > > >> tables and then filter the views based on attribute tableKind which
> > is a
> > > >> part of table object in api response.
> > > >>
> > > >>
> > > >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to
> > String.
> > > >>> It seems the char's size will lose, is it possible to have a better
> > > mapping
> > > >>> which won't loss the size of char type?
> > > >>
> > > >>
> > > >> Thanks for pointing this out ! I have updated the flip with the
> > correct
> > > >> type. Initilially i mapped chartype , varchar type to string but
> > > updated it
> > > >> to directly map to the same type .
> > > >>
> > > >>
> > > >>
> > > >>> 4: About the "Flink CatalogFunction mapping with Glue Function"
> part,
> > > >>> how do we map the function language in Flink's CatalogFunction.
> > > >>
> > > >>
> > > >> Glue Api (UserDefinedFunctionInput) doesn't support specific
> attribute
> > > >> for function language. Here is how aws hive compatible metastore is
> > > mapping
> > > >> hive function to glue function[2]. We will append a prefix of
> Language
> > > in
> > > >> the function name itself indicating the language. I see this has
> been
> > > >> already done for the Hive Catalog [3]. We are thinking of
> implementing
> > > it
> > > >> in the same way.
> > > >>
> > > >> [1] https://issues.apache.org/jira/browse/FLINK-22540
> > > >> [2]
> > > >>
> > >
> >
> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
> > > >> [3]
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
> > > >>
> > > >> Samrat
> > > >>
> > > >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <lindon...@gmail.com>
> wrote:
> > > >>
> > > >>> Hi Samrat,
> > > >>>
> > > >>> Thanks for the FLIP!
> > > >>>
> > > >>> Since this is the first proposal for adding a vendor-specific
> catalog
> > > >>> library in Flink, I think maybe we should also externalize those
> > > catalog
> > > >>> libraries similar to how we are externalizing connector libraries.
> It
> > > is
> > > >>> likely that we might want to add catalogs for other vectors in the
> > > >>> future.
> > > >>> Externalizing those catalogs can make Flink development more
> scalable
> > > in
> > > >>> the long term.
> > > >>>
> > > >>> It is mentioned in the FLIP that there will be two types of
> > > SdkHttpClient
> > > >>> supported based on the catalog option http-client.type. Is
> > > >>> http-client.type
> > > >>> a public config for the GlueCatalog? If yes, can we add this config
> > to
> > > >>> the
> > > >>> "Configurations" section and explain how users should choose the
> > client
> > > >>> type?
> > > >>>
> > > >>> Regards,
> > > >>> Dong
> > > >>>
> > > >>>
> > > >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <decordea...@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>> > Hi everyone,
> > > >>> >
> > > >>> > I would like to open a discussion[1] on providing GlueCatalog
> > support
> > > >>> > in Flink.
> > > >>> > Currently, Flink offers 3 major types of catalog[2]. Out of which
> > > only
> > > >>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We
> > > would
> > > >>> like
> > > >>> > to introduce GlueCatalog in Flink offering another option for
> users
> > > >>> which
> > > >>> > will be persistent in nature. Aws Glue data catalog is a
> > centralized
> > > >>> data
> > > >>> > catalog in AWS cloud that provides integrations with many
> different
> > > >>> > connectors[3]. Flink GlueCatalog can use the features provided by
> > > glue
> > > >>> and
> > > >>> > create strong integration with other services in the cloud.
> > > >>> >
> > > >>> > [1]
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> > > >>> >
> > > >>> > [2]
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
> > > >>> >
> > > >>> > [3]
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
> > > >>> >
> > > >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
> > > >>> >
> > > >>> > Bests
> > > >>> > Samrat
> > > >>> >
> > > >>>
> > > >>
> > >
> >
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk

Reply via email to