Hi Samrat,

Sorry for the late reply. Yeah I am referring to creating a similar
external repo such as flink-catalog-glue. flink-connector-aws is already
named with `connector` so it seems a bit weird to put a catalog there.

Thanks!
Dong

On Wed, Dec 7, 2022 at 1:04 PM Samrat Deb <decordea...@gmail.com> wrote:

> Hi Dong Lin,
>
> Since this is the first proposal for adding a vendor-specific catalog
> > library in Flink, I think maybe we should also externalize those catalog
> > libraries similar to how we are externalizing connector libraries. It is
> > likely that we might want to add catalogs for other vectors in the
> future.
> > Externalizing those catalogs can make Flink development more scalable in
> > the long term.
>
> Initially i mis-interpretted externalising the catalogs, There already
> exists an externalised connector for aws [1].
> Are you referring to creating a similar external repo for catalogs or will
> it be better to add it in flink-connector-aws[1] ?
>
> [1] https://github.com/apache/flink-connector-aws
>
> Samrat
>
> On Tue, Dec 6, 2022 at 6:52 PM Samrat Deb <decordea...@gmail.com> wrote:
>
> > Hi Dong Lin,
> >
> > Aws Glue Data catalog is vendor specific and in future we will get such
> > type of implementation from different providers. We should
> > definitely externalize these catalog libraries similar to flink
> connectors.
> > I am thinking of creating
> > flink-catalog similar to flink-connector under the root (flink). glue
> > catalog can be one of modules under the flink-catalog . Please suggest if
> > there is a better structure we can create for catalogs.
> >
> >
> > It is mentioned in the FLIP that there will be two types of SdkHttpClient
> >> supported based on the catalog option http-client.type. Is
> >> http-client.type
> >> a public config for the GlueCatalog? If yes, can we add this config to
> the
> >> "Configurations" section and explain how users should choose the client
> >> type?
> >
> >
> > yes http-client.type is public config for the GlueCatalog. By default
> > client-type will be `urlconnection` , if user don't specify any
> connection
> > type.
> > I have updated the FLIP-277[1] #configuration section with all the
> configs
> > . Please review it again .
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> >
> > Samrat
> >
> > On Tue, Dec 6, 2022 at 5:50 PM Samrat Deb <decordea...@gmail.com> wrote:
> >
> >> Hi Yuxia,
> >>
> >> Thank you for reviewing the flip and putting forward your observations
> >> and comments.
> >>
> >> 1: I noticed there's a YAML part in the section of "Using the Catalog",
> >>> what do you mean by that? Do you mean how to use glue catalog in sql
> >>> client? If so, just for your information, it's not supported to use
> yaml
> >>> envrioment file in sql client[2].
> >>
> >>
> >> Thank you for attaching the jira ticket [1] . I missed the changes.
> >> There is a provision to register catalog directly through factory
> resources
> >> .
> >> - GenericInMemoryCatalog is defined through
> >>
> `flink/flink-table/flink-table-api-java/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> >> - HiveCatalog is defined through
> >> path
> `flink-connectors/flink-connector-hive/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory`
> >> Similarly on the vendor specific module for Aws Glue we can define it.
> >>
> >> 2: Seems there's a typo in "Design#views" part, it contains "listTables"
> >>> which I think shouldn't be contained.
> >>
> >>
> >> oh yes 😅 ! fixed it now thanks for pointing it out.
> >>
> >>
> >> Also, I'm curious about how to list views using Glue API. Is there an
> >>> on-hand api to list views directly or we need to list the tables and
> then
> >>> filter the views using the table-kind?
> >>
> >>
> >> yes there is no in-hand api for list views directly , we need to list
> all
> >> tables and then filter the views based on attribute tableKind which is a
> >> part of table object in api response.
> >>
> >>
> >> 3: In "Flink Glue DataType Mapping" part, CharType is mapped to String.
> >>> It seems the char's size will lose, is it possible to have a better
> mapping
> >>> which won't loss the size of char type?
> >>
> >>
> >> Thanks for pointing this out ! I have updated the flip with the correct
> >> type. Initilially i mapped chartype , varchar type to string but
> updated it
> >> to directly map to the same type .
> >>
> >>
> >>
> >>> 4: About the "Flink CatalogFunction mapping with Glue Function" part,
> >>> how do we map the function language in Flink's CatalogFunction.
> >>
> >>
> >> Glue Api (UserDefinedFunctionInput) doesn't support specific attribute
> >> for function language. Here is how aws hive compatible metastore is
> mapping
> >> hive function to glue function[2]. We will append a prefix of Language
> in
> >> the function name itself indicating the language. I see this has been
> >> already done for the Hive Catalog [3]. We are thinking of implementing
> it
> >> in the same way.
> >>
> >> [1] https://issues.apache.org/jira/browse/FLINK-22540
> >> [2]
> >>
> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/master/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/converters/GlueInputConverter.java#L83
> >> [3]
> >>
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java#L1415
> >>
> >> Samrat
> >>
> >> On Mon, Dec 5, 2022 at 4:33 PM Dong Lin <lindon...@gmail.com> wrote:
> >>
> >>> Hi Samrat,
> >>>
> >>> Thanks for the FLIP!
> >>>
> >>> Since this is the first proposal for adding a vendor-specific catalog
> >>> library in Flink, I think maybe we should also externalize those
> catalog
> >>> libraries similar to how we are externalizing connector libraries. It
> is
> >>> likely that we might want to add catalogs for other vectors in the
> >>> future.
> >>> Externalizing those catalogs can make Flink development more scalable
> in
> >>> the long term.
> >>>
> >>> It is mentioned in the FLIP that there will be two types of
> SdkHttpClient
> >>> supported based on the catalog option http-client.type. Is
> >>> http-client.type
> >>> a public config for the GlueCatalog? If yes, can we add this config to
> >>> the
> >>> "Configurations" section and explain how users should choose the client
> >>> type?
> >>>
> >>> Regards,
> >>> Dong
> >>>
> >>>
> >>> On Sat, Dec 3, 2022 at 12:31 PM Samrat Deb <decordea...@gmail.com>
> >>> wrote:
> >>>
> >>> > Hi everyone,
> >>> >
> >>> > I would like to open a discussion[1] on providing GlueCatalog support
> >>> > in Flink.
> >>> > Currently, Flink offers 3 major types of catalog[2]. Out of which
> only
> >>> > HiveCatalog is a persistent catalog backed by Hive Metastore. We
> would
> >>> like
> >>> > to introduce GlueCatalog in Flink offering another option for users
> >>> which
> >>> > will be persistent in nature. Aws Glue data catalog is a centralized
> >>> data
> >>> > catalog in AWS cloud that provides integrations with many different
> >>> > connectors[3]. Flink GlueCatalog can use the features provided by
> glue
> >>> and
> >>> > create strong integration with other services in the cloud.
> >>> >
> >>> > [1]
> >>> >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
> >>> >
> >>> > [2]
> >>> >
> >>> >
> >>>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/
> >>> >
> >>> > [3]
> >>> >
> >>> >
> >>>
> https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro
> >>> >
> >>> > [4] https://issues.apache.org/jira/browse/FLINK-29549
> >>> >
> >>> > Bests
> >>> > Samrat
> >>> >
> >>>
> >>
>

Reply via email to