Re: [DISCUSS] FLIP-92: JDBC catalog and Postgres catalog

Zhenghua Gao Thu, 09 Jan 2020 21:49:16 -0800

Hi Bowen, Thanks for driving this.
I think it would be very convenience to use tables in external DBs with
JDBC Catalog.


I have one concern about "Flink-Postgres Data Type Mapping" part:

In Postgress, the TIME/TIMESTAMP WITH TIME ZONE has the java.time.Instant
semantic,
and should be mapped to Flink's TIME/TIMESTAMP WITH LOCAL TIME ZONE

*Best Regards,*
*Zhenghua Gao*


On Fri, Jan 10, 2020 at 11:09 AM Jingsong Li <[email protected]> wrote:

> Hi Bowen, thanks for reply and updating.
>
> > I don't see much value in providing a builder for jdbc catalogs, as they
> only have 4 or 5 required params, no optional ones. I prefer users just
> provide a base url without default db, usrname, pwd so we don't need to
> parse url all around, as I mentioned jdbc catalog may need to establish
> connections to different databases in a db instance,
>
> I suggest that the parameters can be completely consistent with the
> JDBCTableSource / JDBCTableSink.
> If you take a look to JDBC api: "DriverManager.getConnection".
> That allow "default db, username, pwd" things optional. They can included
> in URL. Of course JDBC api also allows establishing connections to
> different databases in a db instance.
> So I think we don't need provide a "base_url", we can just provide a real
> "url".
> To be consistent with JDBC api.
>
> Best,
> Jingsong Lee
>
> On Fri, Jan 10, 2020 at 10:34 AM Jark Wu <[email protected]> wrote:
>
> > Thanks Bowen for the reply,
> >
> > A user-facing JDBCCatalog and 'catalog.type' = 'jdbc'  sounds good to me.
> >
> > I have some other minor comments when I went through the updated
> > documentation:
> >
> > 1) 'base_url' configuration: We are following the configuration format
> > guideline [1] which suggest to use dash (-) instead of underline (_).
> >      And I'm a little confused the meaning of "base_url" at the first
> > glance, another idea is split it into several configurations: 'driver',
> > 'hostname', 'port'.
> >
> > 2) 'default-database' is optional, then which database will be used or
> what
> > is the behavior when the default database is not selected.
> >
> > 3) a builder for jdbc catalogs: I agree with Jingsong to provide a
> builder.
> > Because there is optional configuration here (the default database),
> >    and providind Builder as the API will be easier for evolution, I'm not
> > sure we won't add/modify parameters in the future.
> >
> > [1]:
> >
> >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >
> > On Fri, 10 Jan 2020 at 04:52, Bowen Li <[email protected]> wrote:
> >
> > > Hi Jark and Jingsong,
> > >
> > > Thanks for your review. Please see my reply in line.
> > >
> > > > why introducing a `PostgresJDBCCatalog`, not a generic `JDBCCatalog`
> > > (catalog.type = 'postgres' vs 'jdbc') ?
> > >
> > > Thanks for the reminding and I looked at JDBCDialect. A generic,
> > > user-facing JDBCCatalog with catalog.type = jdbc and find specific db
> > > implementations (pg v.s. mysql v.s. ...) is more aligned with how jdbc
> > > sink/source is handled, indeed. However, the catalogs would also need
> to
> > > execute the query and parse query results in a db-dependent way. E.g.
> > jdbc
> > > catalog needs to establish connections to different databases within a
> db
> > > instance on demand. So just having JDBCDialect won't be enough.
> > >
> > > I think we can do the following:
> > >   - provide a user-facing JDBCCatalog, composing a db-specific impl
> like
> > > PostgresJDBCCatalog and MySQLJDBCCatalog. Users still specify "jdbc" as
> > > type in both Table API and SQL CLI, internally it will create a
> > db-specific
> > > impl depending on jdbc base url.
> > >   - some statements can reside in JDBCDialect. Query execution and
> result
> > > parsing logic would be located in db-specific impls.
> > >
> > > - We can provide a Builder for Catalog, In my opinion, defaultDatabase,
> > > username, pwd can be included in JDBC DB url.
> > >
> > > I don't see much value in providing a builder for jdbc catalogs, as
> they
> > > only have 4 or 5 required params, no optional ones. I prefer users just
> > > provide a base url without default db, usrname, pwd so we don't need to
> > > parse url all around, as I mentioned jdbc catalog may need to establish
> > > connections to different databases in a db instance,
> > >
> > > - About timestamp and time, write down the specific Flink precision of
> > > Postgres?
> > >
> > > I've documented that. It's 0-6
> > >
> > > - I think there is a part missing in your document, that is how to use
> > this
> > > catalog. If you can write a complete example, I think it will be much
> > > clearer.
> > >
> > > I added some examples in both table api and SQL Cli. It will be no
> > > different from existing catalogs.
> > >
> > > - So a thing is what TableFactory will this catalog use? For example,
> > > JDBCTableSourceSinkFactory has different parameters for source or sink?
> > How
> > > do you think about it?
> > >
> > > This catalog will directly call JDBCTableSourceSinkFactory without
> going
> > > thru service discovery because we are sure it's a jdbc table. I added
> it
> > to
> > > the doc.
> > >
> > > For the different params besides schema, as we discussed offline,
> > > unfortunately we can't do anything right now until Flink DDL/DML are
> able
> > > to distinguish 3 types of params - external data's metada, source/sink
> > > runtime params, and Flink semantics params. The latter two can't be
> > > provided by catalogs. The problem is actually general to all catalogs,
> > not
> > > just JDBCCatalog. I'm pushing for such an effort to solve it. At this
> > > moment we can only use some default params for some cases, and the
> other
> > > cases cannot take advantage of the JDBC catalog and users still have to
> > > write DDL manually.
> > >
> > > Thanks,
> > > Bowen
> > >
> > > On Wed, Jan 8, 2020 at 7:46 PM Jingsong Li <[email protected]>
> > wrote:
> > >
> > > > Thanks Bowen for driving this,
> > > >
> > > > +1 for this, The DDL schema definition is a headache for users, and
> > > catalog
> > > > is a solution to this problem.
> > > >
> > > > I have some questions and suggestions:
> > > >
> > > > - We can provide a Builder for Catalog, In my opinion,
> defaultDatabase,
> > > > username, pwd can be included in JDBC DB url.
> > > >
> > > > - About timestamp and time, write down the specific Flink precision
> of
> > > > Postgres?
> > > >
> > > > - I think there is a part missing in your document, that is how to
> use
> > > this
> > > > catalog. If you can write a complete example, I think it will be much
> > > > clearer.
> > > >
> > > > - So a thing is what TableFactory will this catalog use? For example,
> > > > JDBCTableSourceSinkFactory has different parameters for source or
> sink?
> > > How
> > > > do you think about it?
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Thu, Jan 9, 2020 at 11:33 AM Jark Wu <[email protected]> wrote:
> > > >
> > > > > Thanks Bowen for driving this.
> > > > >
> > > > > +1 to this feature.
> > > > >
> > > > > My concern is that why introducing a `PostgresJDBCCatalog`, not a
> > > generic
> > > > > `JDBCCatalog` (catalog.type = 'postgres' vs 'jdbc') ?
> > > > > From my understanding, JDBC catalog is similar to JDBC source/sink.
> > For
> > > > > JDBC source/sink, we have a generic
> > > > > implementation for JDBC and delegate operations to JDBCDialect.
> > > Different
> > > > > driver may have different implementation of
> > > > > JDBCDialect, e.g `quoteIdentifier()`.
> > > > >
> > > > > For JDBC catalog, I guess maybe we can do it in the same way, i.e.
> a
> > > > > generic JDBCCatalog implementation and delegate
> > > > > operations to JDBCDialect, and we will have `listDataBase()`,
> > > > > `listTables()` interfaces in JDBCDialect. The benefit is that:
> > > > > 0) reuse the existing `JDBCDialect`, I guess JDBCCatalog also need
> to
> > > > quote
> > > > > identifiers.
> > > > > 1) we can easily to support a new database catalog (e.g. mysql) by
> > > > > implementing new dialects (e.g. MySQLDialect).
> > > > > 2) this can keep the same behavior as JDBC source/sink, i.e.
> > > > > connector.type=jdbc, catalog.type=jdbc
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > >
> > > > > On Thu, 9 Jan 2020 at 08:33, Bowen Li <[email protected]> wrote:
> > > > >
> > > > > > Hi dev,
> > > > > >
> > > > > > I'd like to kick off a discussion on adding JDBC catalogs,
> > > specifically
> > > > > > Postgres catalog in Flink [1].
> > > > > >
> > > > > > Currently users have to manually create schemas in Flink
> > source/sink
> > > > > > mirroring tables in their relational databases in use cases like
> > JDBC
> > > > > > read/write and consuming CDC. Many users have complaint about the
> > > > > > unnecessary, redundant, manual work. Any mismatch can lead to a
> > > failing
> > > > > > Flink job at runtime instead of compile time. All these have been
> > > quite
> > > > > > unpleasant, resulting in a broken user experience.
> > > > > >
> > > > > > We want to provide a JDBC catalog interface and a Postgres
> > > > implementation
> > > > > > for Flink as a start to connect to all kinds of relational
> > databases,
> > > > > > enabling Flink SQL to 1) retrieve table schema automatically
> > without
> > > > > > requiring user writes duped DDL 2) check at compile time for
> schema
> > > > > errors.
> > > > > > It will greatly streamline user experiences when using Flink to
> > deal
> > > > with
> > > > > > popular relational databases like Postgres, MySQL, MariaDB, AWS
> > > Aurora,
> > > > > > etc.
> > > > > >
> > > > > > Note that the problem and solution are actually very general to
> > Flink
> > > > > when
> > > > > > connecting to all kinds of external systems. We just focus on
> > solving
> > > > > that
> > > > > > for relational databases in this FLIP.
> > > > > >
> > > > > > Thanks,
> > > > > > Bowen
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-92%3A+JDBC+catalog+and+Postgres+catalog
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best, Jingsong Lee
> > > >
> > >
> >
>
>
> --
> Best, Jingsong Lee
>

Re: [DISCUSS] FLIP-92: JDBC catalog and Postgres catalog

Reply via email to