Re: [DISCUSS] FLIP-92: JDBC catalog and Postgres catalog

Bowen Li Thu, 09 Jan 2020 12:52:33 -0800

Hi Jark and Jingsong,

Thanks for your review. Please see my reply in line.


> why introducing a `PostgresJDBCCatalog`, not a generic `JDBCCatalog`
(catalog.type = 'postgres' vs 'jdbc') ?

Thanks for the reminding and I looked at JDBCDialect. A generic,
user-facing JDBCCatalog with catalog.type = jdbc and find specific db
implementations (pg v.s. mysql v.s. ...) is more aligned with how jdbc
sink/source is handled, indeed. However, the catalogs would also need to
execute the query and parse query results in a db-dependent way. E.g. jdbc
catalog needs to establish connections to different databases within a db
instance on demand. So just having JDBCDialect won't be enough.

I think we can do the following:
  - provide a user-facing JDBCCatalog, composing a db-specific impl like
PostgresJDBCCatalog and MySQLJDBCCatalog. Users still specify "jdbc" as
type in both Table API and SQL CLI, internally it will create a db-specific
impl depending on jdbc base url.
  - some statements can reside in JDBCDialect. Query execution and result
parsing logic would be located in db-specific impls.

- We can provide a Builder for Catalog, In my opinion, defaultDatabase,
username, pwd can be included in JDBC DB url.

I don't see much value in providing a builder for jdbc catalogs, as they
only have 4 or 5 required params, no optional ones. I prefer users just
provide a base url without default db, usrname, pwd so we don't need to
parse url all around, as I mentioned jdbc catalog may need to establish
connections to different databases in a db instance,

- About timestamp and time, write down the specific Flink precision of
Postgres?

I've documented that. It's 0-6

- I think there is a part missing in your document, that is how to use this
catalog. If you can write a complete example, I think it will be much
clearer.

I added some examples in both table api and SQL Cli. It will be no
different from existing catalogs.

- So a thing is what TableFactory will this catalog use? For example,
JDBCTableSourceSinkFactory has different parameters for source or sink? How
do you think about it?

This catalog will directly call JDBCTableSourceSinkFactory without going
thru service discovery because we are sure it's a jdbc table. I added it to
the doc.

For the different params besides schema, as we discussed offline,
unfortunately we can't do anything right now until Flink DDL/DML are able
to distinguish 3 types of params - external data's metada, source/sink
runtime params, and Flink semantics params. The latter two can't be
provided by catalogs. The problem is actually general to all catalogs, not
just JDBCCatalog. I'm pushing for such an effort to solve it. At this
moment we can only use some default params for some cases, and the other
cases cannot take advantage of the JDBC catalog and users still have to
write DDL manually.

Thanks,
Bowen

On Wed, Jan 8, 2020 at 7:46 PM Jingsong Li <jingsongl...@gmail.com> wrote:

> Thanks Bowen for driving this,
>
> +1 for this, The DDL schema definition is a headache for users, and catalog
> is a solution to this problem.
>
> I have some questions and suggestions:
>
> - We can provide a Builder for Catalog, In my opinion, defaultDatabase,
> username, pwd can be included in JDBC DB url.
>
> - About timestamp and time, write down the specific Flink precision of
> Postgres?
>
> - I think there is a part missing in your document, that is how to use this
> catalog. If you can write a complete example, I think it will be much
> clearer.
>
> - So a thing is what TableFactory will this catalog use? For example,
> JDBCTableSourceSinkFactory has different parameters for source or sink? How
> do you think about it?
>
> Best,
> Jingsong Lee
>
> On Thu, Jan 9, 2020 at 11:33 AM Jark Wu <imj...@gmail.com> wrote:
>
> > Thanks Bowen for driving this.
> >
> > +1 to this feature.
> >
> > My concern is that why introducing a `PostgresJDBCCatalog`, not a generic
> > `JDBCCatalog` (catalog.type = 'postgres' vs 'jdbc') ?
> > From my understanding, JDBC catalog is similar to JDBC source/sink. For
> > JDBC source/sink, we have a generic
> > implementation for JDBC and delegate operations to JDBCDialect. Different
> > driver may have different implementation of
> > JDBCDialect, e.g `quoteIdentifier()`.
> >
> > For JDBC catalog, I guess maybe we can do it in the same way, i.e. a
> > generic JDBCCatalog implementation and delegate
> > operations to JDBCDialect, and we will have `listDataBase()`,
> > `listTables()` interfaces in JDBCDialect. The benefit is that:
> > 0) reuse the existing `JDBCDialect`, I guess JDBCCatalog also need to
> quote
> > identifiers.
> > 1) we can easily to support a new database catalog (e.g. mysql) by
> > implementing new dialects (e.g. MySQLDialect).
> > 2) this can keep the same behavior as JDBC source/sink, i.e.
> > connector.type=jdbc, catalog.type=jdbc
> >
> > Best,
> > Jark
> >
> >
> > On Thu, 9 Jan 2020 at 08:33, Bowen Li <bowenl...@gmail.com> wrote:
> >
> > > Hi dev,
> > >
> > > I'd like to kick off a discussion on adding JDBC catalogs, specifically
> > > Postgres catalog in Flink [1].
> > >
> > > Currently users have to manually create schemas in Flink source/sink
> > > mirroring tables in their relational databases in use cases like JDBC
> > > read/write and consuming CDC. Many users have complaint about the
> > > unnecessary, redundant, manual work. Any mismatch can lead to a failing
> > > Flink job at runtime instead of compile time. All these have been quite
> > > unpleasant, resulting in a broken user experience.
> > >
> > > We want to provide a JDBC catalog interface and a Postgres
> implementation
> > > for Flink as a start to connect to all kinds of relational databases,
> > > enabling Flink SQL to 1) retrieve table schema automatically without
> > > requiring user writes duped DDL 2) check at compile time for schema
> > errors.
> > > It will greatly streamline user experiences when using Flink to deal
> with
> > > popular relational databases like Postgres, MySQL, MariaDB, AWS Aurora,
> > > etc.
> > >
> > > Note that the problem and solution are actually very general to Flink
> > when
> > > connecting to all kinds of external systems. We just focus on solving
> > that
> > > for relational databases in this FLIP.
> > >
> > > Thanks,
> > > Bowen
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-92%3A+JDBC+catalog+and+Postgres+catalog
> > >
> >
>
>
> --
> Best, Jingsong Lee
>

Re: [DISCUSS] FLIP-92: JDBC catalog and Postgres catalog

Reply via email to