Hi everyone,

quick question for my understanding: how is this different to

CREATE TABLE IF NOT EXISTS my_table (
...
) WITH (
...
);
INSERT INTO my_table SELECT ...;

?

Is it only about a) not having to specify the schema and b) a more
condensed syntax?

Cheers,

Konstantin

On Fri, May 28, 2021 at 11:30 AM Jark Wu <imj...@gmail.com> wrote:

> Thanks Danny for starting the discussion of extending CTAS syntax.
>
> I think this is a very useful feature for data integration and ETL jobs (a
> big use case of Flink).
> Many users complain a lot that manually defining schemas for sources and
> sinks is hard.
> CTAS helps users to write ETL jobs without defining any schemas of sources
> and sinks.
> CTAS automatically creates physical tables in external systems, and
> automatically
> maps external tables to Flink tables with the help of catalogs (e.g.
> PgCatalog, HiveCatalog).
>
> On the other hand, the schema of the SELECT query is fixed after compile
> time.
> CTAS TABLE extends the syntax which allows dynamic schema during runtime,
> semantically it streaming copies the up-to-date structure and data (if run
> in streaming mode).
> So I think CTAS TABLE is a major step forward for data integration, it
> defines a syntax
> which allows the underlying streaming pipeline automatically migrate schema
> evolution
> (e.g. ADD COLUMN) from source tables to sink tables without stopping jobs
> or updating SQLs.
>
> Therefore, I'm +1 for the feature.
>
> Best,
> Jark
>
> On Fri, 28 May 2021 at 16:22, JING ZHANG <beyond1...@gmail.com> wrote:
>
> > Hi Danny,
> >
> > Thanks for starting this discussion.
> >
> >
> >
> > Big +1 for this feature. Both CTAS AND CREATE TABLE LIKE are very useful
> > features. IMO, it is clear to separate them into two parts in the
> `syntax`
> > character. 😀
> >
> >
> >
> > First, I have two related problems:
> >
> >
> > 1. Would `create table` in CTAS trigger to create a physical table in
> > external storage system?
> >
> > For example, now normal `create table` would only define a connecting
> with
> > an existed external Kafka topic instead of trigger to create a physical
> > kafka topic in kafka cluster. Does this behavior still work for CTAS AND
> > CREATE TABLE LIKE?
> >
> >
> > 2. Would the data sync  of CTAS run continuously if select works on a
> > unbounded source?
> >
> > Since sub select query may works on unbounded source in Flink, which is
> > different with other system (postgres, spark, hive, mysql). Does data
> sync
> > continuously run or just sync the snapshot at the job submit?
> >
> >
> >
> > Besides, I have some minor problems which is mentioned in your email.
> >
> >
> >
> > > how to write data into existing table with history data declare [IF NOT
> > EXISTS] keywords and we ignore the table creation but the pipeline still
> > starts up
> >
> >
> >
> > Maybe we should check old schema and new schema. What would happen if
> > schema of existed table is different with new schema?
> >
> >
> >
> > > How to match sub-database and sub-table ? Use regex style source table
> > name
> >
> >
> >
> >    1. What would happen if schema of matched tables different with each
> > other?
> >
> >    2. What orders to sync data of all matched table? Sync data from all
> > matched tables one by one or at the same time?
> >
> >
> >
> > >  AS select_statement: copy source table data into target
> >
> >
> >
> > User could  explicitly specify the data type for each column in the CTAS,
> > what happened when run the following example. The demo is from MySQL
> > document,
> https://dev.mysql.com/doc/refman/5.6/en/create-table-select.html
> > , the result is a bit unexpected, I wonder
> >
> > What the behavior would be in Flink.
> >
> >
> > [image: image.png]
> >
> > Best,
> > JING ZHANG
> >
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

Reply via email to