Hi everyone, quick question for my understanding: how is this different to
CREATE TABLE IF NOT EXISTS my_table ( ... ) WITH ( ... ); INSERT INTO my_table SELECT ...; ? Is it only about a) not having to specify the schema and b) a more condensed syntax? Cheers, Konstantin On Fri, May 28, 2021 at 11:30 AM Jark Wu <imj...@gmail.com> wrote: > Thanks Danny for starting the discussion of extending CTAS syntax. > > I think this is a very useful feature for data integration and ETL jobs (a > big use case of Flink). > Many users complain a lot that manually defining schemas for sources and > sinks is hard. > CTAS helps users to write ETL jobs without defining any schemas of sources > and sinks. > CTAS automatically creates physical tables in external systems, and > automatically > maps external tables to Flink tables with the help of catalogs (e.g. > PgCatalog, HiveCatalog). > > On the other hand, the schema of the SELECT query is fixed after compile > time. > CTAS TABLE extends the syntax which allows dynamic schema during runtime, > semantically it streaming copies the up-to-date structure and data (if run > in streaming mode). > So I think CTAS TABLE is a major step forward for data integration, it > defines a syntax > which allows the underlying streaming pipeline automatically migrate schema > evolution > (e.g. ADD COLUMN) from source tables to sink tables without stopping jobs > or updating SQLs. > > Therefore, I'm +1 for the feature. > > Best, > Jark > > On Fri, 28 May 2021 at 16:22, JING ZHANG <beyond1...@gmail.com> wrote: > > > Hi Danny, > > > > Thanks for starting this discussion. > > > > > > > > Big +1 for this feature. Both CTAS AND CREATE TABLE LIKE are very useful > > features. IMO, it is clear to separate them into two parts in the > `syntax` > > character. 😀 > > > > > > > > First, I have two related problems: > > > > > > 1. Would `create table` in CTAS trigger to create a physical table in > > external storage system? > > > > For example, now normal `create table` would only define a connecting > with > > an existed external Kafka topic instead of trigger to create a physical > > kafka topic in kafka cluster. Does this behavior still work for CTAS AND > > CREATE TABLE LIKE? > > > > > > 2. Would the data sync of CTAS run continuously if select works on a > > unbounded source? > > > > Since sub select query may works on unbounded source in Flink, which is > > different with other system (postgres, spark, hive, mysql). Does data > sync > > continuously run or just sync the snapshot at the job submit? > > > > > > > > Besides, I have some minor problems which is mentioned in your email. > > > > > > > > > how to write data into existing table with history data declare [IF NOT > > EXISTS] keywords and we ignore the table creation but the pipeline still > > starts up > > > > > > > > Maybe we should check old schema and new schema. What would happen if > > schema of existed table is different with new schema? > > > > > > > > > How to match sub-database and sub-table ? Use regex style source table > > name > > > > > > > > 1. What would happen if schema of matched tables different with each > > other? > > > > 2. What orders to sync data of all matched table? Sync data from all > > matched tables one by one or at the same time? > > > > > > > > > AS select_statement: copy source table data into target > > > > > > > > User could explicitly specify the data type for each column in the CTAS, > > what happened when run the following example. The demo is from MySQL > > document, > https://dev.mysql.com/doc/refman/5.6/en/create-table-select.html > > , the result is a bit unexpected, I wonder > > > > What the behavior would be in Flink. > > > > > > [image: image.png] > > > > Best, > > JING ZHANG > > > -- Konstantin Knauf https://twitter.com/snntrable https://github.com/knaufk