Hi Konstantin, >From my understanding, this syntax has 2 major benefits:
1. Just like you said, it saves the effort to specify the schema, especially when involving hundreds of fields. 2. When using CREATE TABLE xx AS TABLE yy, it gives us the possibility to enable schema evolution, and it seems pretty natural to do so. Best, Kurt On Fri, May 28, 2021 at 5:44 PM Konstantin Knauf <kna...@apache.org> wrote: > Hi everyone, > > quick question for my understanding: how is this different to > > CREATE TABLE IF NOT EXISTS my_table ( > ... > ) WITH ( > ... > ); > INSERT INTO my_table SELECT ...; > > ? > > Is it only about a) not having to specify the schema and b) a more > condensed syntax? > > Cheers, > > Konstantin > > On Fri, May 28, 2021 at 11:30 AM Jark Wu <imj...@gmail.com> wrote: > > > Thanks Danny for starting the discussion of extending CTAS syntax. > > > > I think this is a very useful feature for data integration and ETL jobs > (a > > big use case of Flink). > > Many users complain a lot that manually defining schemas for sources and > > sinks is hard. > > CTAS helps users to write ETL jobs without defining any schemas of > sources > > and sinks. > > CTAS automatically creates physical tables in external systems, and > > automatically > > maps external tables to Flink tables with the help of catalogs (e.g. > > PgCatalog, HiveCatalog). > > > > On the other hand, the schema of the SELECT query is fixed after compile > > time. > > CTAS TABLE extends the syntax which allows dynamic schema during runtime, > > semantically it streaming copies the up-to-date structure and data (if > run > > in streaming mode). > > So I think CTAS TABLE is a major step forward for data integration, it > > defines a syntax > > which allows the underlying streaming pipeline automatically migrate > schema > > evolution > > (e.g. ADD COLUMN) from source tables to sink tables without stopping jobs > > or updating SQLs. > > > > Therefore, I'm +1 for the feature. > > > > Best, > > Jark > > > > On Fri, 28 May 2021 at 16:22, JING ZHANG <beyond1...@gmail.com> wrote: > > > > > Hi Danny, > > > > > > Thanks for starting this discussion. > > > > > > > > > > > > Big +1 for this feature. Both CTAS AND CREATE TABLE LIKE are very > useful > > > features. IMO, it is clear to separate them into two parts in the > > `syntax` > > > character. 😀 > > > > > > > > > > > > First, I have two related problems: > > > > > > > > > 1. Would `create table` in CTAS trigger to create a physical table in > > > external storage system? > > > > > > For example, now normal `create table` would only define a connecting > > with > > > an existed external Kafka topic instead of trigger to create a physical > > > kafka topic in kafka cluster. Does this behavior still work for CTAS > AND > > > CREATE TABLE LIKE? > > > > > > > > > 2. Would the data sync of CTAS run continuously if select works on a > > > unbounded source? > > > > > > Since sub select query may works on unbounded source in Flink, which is > > > different with other system (postgres, spark, hive, mysql). Does data > > sync > > > continuously run or just sync the snapshot at the job submit? > > > > > > > > > > > > Besides, I have some minor problems which is mentioned in your email. > > > > > > > > > > > > > how to write data into existing table with history data declare [IF > NOT > > > EXISTS] keywords and we ignore the table creation but the pipeline > still > > > starts up > > > > > > > > > > > > Maybe we should check old schema and new schema. What would happen if > > > schema of existed table is different with new schema? > > > > > > > > > > > > > How to match sub-database and sub-table ? Use regex style source > table > > > name > > > > > > > > > > > > 1. What would happen if schema of matched tables different with each > > > other? > > > > > > 2. What orders to sync data of all matched table? Sync data from all > > > matched tables one by one or at the same time? > > > > > > > > > > > > > AS select_statement: copy source table data into target > > > > > > > > > > > > User could explicitly specify the data type for each column in the > CTAS, > > > what happened when run the following example. The demo is from MySQL > > > document, > > https://dev.mysql.com/doc/refman/5.6/en/create-table-select.html > > > , the result is a bit unexpected, I wonder > > > > > > What the behavior would be in Flink. > > > > > > > > > [image: image.png] > > > > > > Best, > > > JING ZHANG > > > > > > > > -- > > Konstantin Knauf > > https://twitter.com/snntrable > > https://github.com/knaufk >