Thanks Sergio and Timo for your answers. Sounds good to me. Looking forward for this feature.
Regards, Jeyhun On Thu, Jun 13, 2024 at 4:48 PM Sergio Pena <ser...@confluent.io.invalid> wrote: > Sure Yuxia, I just added the support for RTAS statements too. > > - Sergio > > On Wed, Jun 12, 2024 at 8:22 PM yuxia <luoyu...@alumni.sjtu.edu.cn> wrote: > > > Hi, Sergio. > > Thanks for driving the FLIP. Given we also has REPLACE TABLE AS > > Statement[1] and it's almost same with CREATE TABLE AS Statement, > > would you mind also supporting schema definition for REPLACE TABLE AS > > Statement in this FLIP? It'll be a great to align REPLACE TABLE AS > Statement > > to CREATE TABLE AS Statement > > > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-303%3A+Support+REPLACE+TABLE+AS+SELECT+statement > > > > Best regards, > > Yuxia > > > > ----- 原始邮件 ----- > > 发件人: "Timo Walther" <twal...@apache.org> > > 收件人: "dev" <dev@flink.apache.org> > > 发送时间: 星期三, 2024年 6 月 12日 下午 10:19:14 > > 主题: Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS > Statement > > > > > I just noticed the CREATE TABLE LIKE statement allows the definition > > > of new columns in the CREATE part. The difference > > > with this CTAS proposal is that TABLE LIKE appends the new columns at > > > the end of the schema instead of adding them > > > at the beginning like this proposal and Mysql do. > > > > This should be fine. The LIKE rather "extends from" the right table. > > Whereas the SELECT in CTAS rather extends the left schema definition. > > Given that "the extended part" is always appended, we could argue that > > the current CTAS behavior in the FLIP is acceptable. > > > > > If you want to rename a column in the create part, then that column > > > position must be in the same position as the query column. > > > I didn't like the Postgres approach because it does not let us add > > > columns that do not exist in the query schema. > > > > Flink offers similar functionality in INSERT INTO. INSERT INTO also > > allows syntax like: `INSERT INTO (b, c) SELECT * FROM t`. So given that > > our CTAS can be seen as a CREATE TABLE + INSERT INTO. I would adopt > > Jeyhun comment in the FLIP. If users don't want to add additional schema > > parts, the same column reordering should be available as if one would > > write a INSERT INTO. > > > > Regards, > > Timo > > > > > > > > > > On 12.06.24 04:30, Yanquan Lv wrote: > > > Hi Sergio, thanks for driving it, +1 for this. > > > > > > I have some comments: > > > 1. If we have a source table with primary keys and partition keys > > defined, > > > what is the default behavior if PARTITIONED and DISTRIBUTED not > specified > > > in the CTAS statement, It should not be inherited by default? > > > 2. I suggest providing a complete syntax that includes table_properties > > > like FLIP-218. > > > > > > > > > Sergio Pena <ser...@confluent.io.invalid> 于2024年6月12日周三 03:54写道: > > > > > >> I just noticed the CREATE TABLE LIKE statement allows the definition > of > > new > > >> columns in the CREATE part. The difference > > >> with this CTAS proposal is that TABLE LIKE appends the new columns at > > the > > >> end of the schema instead of adding them > > >> at the beginning like this proposal and Mysql do. > > >> > > >>> create table t1(id int, name string); > > >>>> create table s1(a int, b string) like t1; > > >>>> describe s1; > > >> > > >> +-------------+-----------+----------+--------+ > > >>> | Column Name | Data Type | Nullable | Extras | > > >>> +-------------+-----------+----------+--------+ > > >>> | id | INT | NULL | | > > >>> | name | STRING | NULL | | > > >>> | a | INT | NULL | | > > >>> | b | STRING | NULL | | > > >>> +-------------+-----------+----------+--------+ > > >> > > >> > > >> > > >> The CREATE TABLE LIKE also does not let the definition of existing > > columns > > >> in the CREATE part. The statement fails > > >> that the column already exists. > > >> > > >>> create table t1(id int, name string); > > >> > > >>> create table s1(id double) like t1; > > >>> A column named 'id' already exists in the base table. > > >>> > > >> > > >> What do you guys think of making it similar to the CREATE TABLE LIKE? > > Seems > > >> the best approach in order to > > >> be compatible with it. > > >> > > >> - Sergio > > >> > > >> On Tue, Jun 11, 2024 at 2:10 PM Sergio Pena <ser...@confluent.io> > > wrote: > > >> > > >>> Thanks Timo for answering Jeyhun questions. > > >>> > > >>> To add info more about your questions Jeyhun. This proposal is not > > >>> handling NULL/NOT_NULL types. I noticed that > > >>> the current CTAS impl. (as Timo said) adds this constraint as part of > > the > > >>> resulting schema. And when defining > > >>> a primary key in the CREATE part, if the resulting schema does not > > have a > > >>> NOT NULL in the column then the CTAS > > >>> will fail. This is similar to the CREATE TABLE LIKE which expects the > > >> LIKE > > >>> table to have a NOT NULL column if > > >>> the user defines a primary key in the CREATE part. > > >>> > > >>>> In some cases, redefining the column types might be redundant, > > >> especially > > >>>> when users dont change the column type. A user just wants to change > > the > > >>>> column name from the SELECT clause. Should we also support this > > >> scenario, > > >>>> similar to postgres? > > >>> > > >>> I looked into Postgres too. Postgres matches the columns based on the > > >>> order defined in the create and select part. > > >>> If you want to rename a column in the create part, then that column > > >>> position must be in the same position as the query column. > > >>> I didn't like the Postgres approach because it does not let us add > > >> columns > > >>> that do not exist in the query schema. > > >>> > > >>> i.e. query has schema (a int, b string), now the `a` column is > renamed > > to > > >>> `id` because both are in the same position 0 > > >>> `create table s1(id int) as select a, b from t1`; > > >>> results in: [id int, b string] > > >>> > > >>> I think, if users want to rename then they can use a different alias > in > > >>> the select part. They could also do explicit casting > > >>> for changing the data types, which now makes it redundant (as you > said) > > >> to > > >>> allow redefining the query columns again. But > > >>> perhaps there are cases where explicit casting does not work and just > > >>> defining the column would? i.e. making a nullable > > >>> type to not null? I couldn't make `cast(c1 as int not null)` to work > > for > > >>> instance, but it may work in the create part? > > >>> > > >>>> Could you also mention the casting rules in the FLIP for this case? > > >>> > > >>> I mentioned they're the same as insert/select when doing implicit > > >> casting. > > >>> I will search for more info about the insert/select > > >>> and add the casting rules in the flip.. > > >>> > > >>> - Sergio > > >>> > > >>> > > >>> On Tue, Jun 11, 2024 at 12:59 AM Timo Walther <twal...@apache.org> > > >> wrote: > > >>> > > >>>> Hi Sergio, > > >>>> > > >>>> thanks for proposing this FLIP for finalizing the CTAS statement. > > >>>> Adopting the logic from MySQL for deriving and potentially > overwriting > > >>>> parts of the schema should be easy to understand for everyone. So +1 > > for > > >>>> the FLIP in general. > > >>>> > > >>>> > How do you handle CTAS statements with SELECT clauses that have > > >>>> > (implicit or explicit) NULLABLE or NOT NULLABLE columns? > > >>>> > > >>>> @Jeyhun: I don't think there is anything special about this. The > > current > > >>>> CTAS implementation should already cover that. It takes the > > nullability > > >>>> of the column's data type as a constraint into derived schema. Keep > in > > >>>> mind that nullability is part of the data type in Flink, not only a > > >>>> constraint on the schema. This decision was made due to Calcite > > >> internals. > > >>>> > > >>>> > redefining the column types might be redundant, especially > > >>>> > when users dont change the column type > > >>>> > > >>>> This is indeed a good point. On one hand, I think we should avoid > > >>>> further complicating the syntax. But looking at other vendors [1] > this > > >>>> seems indeed a valid use case. If it doesn't cause too many special > > >>>> cases in the parser (and it's look-ahead), I'm fine with supporting > a > > >>>> list of column names as well. However, the most important use case > > will > > >>>> be specifying a watermark, metadata columns, or other schema parts > > that > > >>>> are not just columns names. > > >>>> > > >>>> Regards, > > >>>> Timo > > >>>> > > >>>> > > >>>> [1] > > >>>> > > >>>> > > >> > > > https://learn.microsoft.com/en-us/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse?view=azure-sqldw-latest > > >>>> > > >>>> > > >>>> On 10.06.24 21:37, Jeyhun Karimov wrote: > > >>>>> Hi Sergio, > > >>>>> > > >>>>> Thanks for driving this FLIP. +1 for it. > > >>>>> I have a few questions: > > >>>>> > > >>>>> - How do you handle CTAS statements with SELECT clauses that have > > >>>> (implicit > > >>>>> or explicit) NULLABLE or NOT NULLABLE columns? Could you also > mention > > >>>> the > > >>>>> casting rules in the FLIP for this case? > > >>>>> - In some cases, redefining the column types might be redundant, > > >>>> especially > > >>>>> when users dont change the column type. For example, a user just > > wants > > >>>> to > > >>>>> change the column name from the SELECT clause. > > >>>>> Should we also support this scenario, similar to the postgres [1] ? > > >>>>> > > >>>>> Regards, > > >>>>> Jeyhun > > >>>>> > > >>>>> > > >>>>> [1] https://www.postgresql.org/docs/8.1/sql-createtableas.html > > >>>>> > > >>>>> On Mon, Jun 10, 2024 at 6:28 PM Sergio Pena > > >> <ser...@confluent.io.invalid > > >>>>> > > >>>>> wrote: > > >>>>> > > >>>>>> Hi David, > > >>>>>> > > >>>>>> The CTAS feature is already part of Flink (proposed in FLIP-218 > > [1]). > > >>>> The > > >>>>>> new FLIP-463 is just to extend the CTAS syntax to allow for adding > > >> new > > >>>>>> columns to > > >>>>>> the created table that are not part of the generated schema. I > think > > >>>>>> FLIP-218 [1] was discussed in the mail list somewhere, but I > > couldn't > > >>>> find > > >>>>>> the discussion thread. > > >>>>>> I was hoping it could contain the answers for your questions as > > >> that's > > >>>>>> where CTAS was implemented. There's a user doc [2] for it that may > > >> help > > >>>>>> too. > > >>>>>> > > >>>>>> But, in a nutshell, CTAS is similar to running CREATE and then > > >>>>>> INSERT/SELECT statements. The CTAS will execute a background job > > that > > >>>>>> executes > > >>>>>> the AS SELECT query and then inserts the data into the newly > created > > >>>> table. > > >>>>>> So the table will be kept up to date as the source data changes as > > >> you > > >>>>>> said. > > >>>>>> > > >>>>>> Generated columns are from the AS SELECT schema, right? Yes, > that's > > >>>> already > > >>>>>> allowed. FLIP-463 will also allow modifying the schema of the sink > > >>>> table > > >>>>>> in case you need to add more columns. > > >>>>>> > > >>>>>> I notice amazon [3] talks of the difference between view and CTAS; > > >> that > > >>>>>>> CTAS persists the content. Is this the approach we are taking? If > > so > > >>>>>> where > > >>>>>>> are we persisting? > > >>>>>> > > >>>>>> This is already supported by FLIP-218 [1]. But yes, the CTAS > > persists > > >>>> the > > >>>>>> content in the new table you specify in the CTAS statement. > > >>>>>> > > >>>>>> > > >>>>>>> is the table read-only, or can we insert/ delete / update into > it. > > >> If > > >>>> it > > >>>>>>> is read only how will the inserts , deletes updates fail. > > >>>>>> > > >>>>>> Are there any restrictions on the select ? Can the select be a > join, > > >>>>>>> aggregate, windowed? > > >>>>>> > > >>>>>> I notice Azure [2] supports ISNULL. Is there a thought to change > the > > >>>>>>> nullability for the CTAS? > > >>>>>> > > >>>>>> Amazon [4] ignores ordered by clauses ? is that the same for this > > >>>> proposal? > > >>>>>> > > >>>>>> I suppose all of this is already supported by FLIP-218. I don't > see > > >> any > > >>>>>> restrictions on the query. You can learn more about it in the > > >> FLIP-218 > > >>>> [1] > > >>>>>> > > >>>>>> - Sergio > > >>>>>> > > >>>>>> [1] > > >>>>>> > > >>>> > > >> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199541185 > > >>>>>> [2] > > >>>>>> > > >>>>>> > > >>>> > > >> > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#as-select_statement > > >>>>>> > > >>>>>> On Mon, Jun 10, 2024 at 3:10 AM David Radley < > > >> david_rad...@uk.ibm.com> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Hi Sergio, > > >>>>>>> Sounds good . I am relatively new to this area and had some basic > > >>>>>>> questions: > > >>>>>>> > > >>>>>>> I notice in [1] it talks of materialized views. And CREATE view > can > > >>>>>>> already take the AS keyword. It would be useful to me to > understand > > >>>> when > > >>>>>> we > > >>>>>>> would use each of these. > > >>>>>>> > > >>>>>>> - I assume the table will be kept up to date as the source data > > >>>> changes > > >>>>>>> like a view. > > >>>>>>> - Are there any restrictions on the select ? Can the select be a > > >> join, > > >>>>>>> aggregate, windowed? > > >>>>>>> - I assume generated columns are allowed? > > >>>>>>> - is the table read-only, or can we insert/ delete / update into > > it. > > >>>> If > > >>>>>> it > > >>>>>>> is read only how will the inserts , deletes updates fail. > > >>>>>>> - I notice Azure [2] supports ISNULL. Is there a thought to > change > > >> the > > >>>>>>> nullability for the CTAS? > > >>>>>>> - I notice amazon [3] talks of the difference between view and > > CTAS; > > >>>> that > > >>>>>>> CTAS persists the content. Is this the approach we are taking? If > > so > > >>>>>> where > > >>>>>>> are we persisting? > > >>>>>>> - Amazon [4] ignores ordered by clauses ? is that the same for > this > > >>>>>>> proposal? > > >>>>>>> > > >>>>>>> [1] > > >>>>>>> > > >>>>>> > > >>>> > > >> > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/dynamic_tables/ > > >>>>>>> [2] > > >>>>>>> > > >>>>>> > > >>>> > > >> > > > https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-develop-ctas#selectinto-vs-ctas > > >>>>>>> [3] > > >>>>>>> > > >>>>>> > > >>>> > > >> > > > https://docs.aws.amazon.com/athena/latest/ug/ctas-considerations-limitations.html#ctas-considerations-limitations-queries-vs-views > > >>>>>>> [4] > > >>>>>>> > > >>>>>> > > >>>> > > >> > > > https://docs.aws.amazon.com/athena/latest/ug/ctas-considerations-limitations.html#ctas-considerations-limitations-order-by-ignored > > >>>>>>> > > >>>>>>> > > >>>>>>> Kind regards, David. > > >>>>>>> > > >>>>>>> From: Sergio Pena <ser...@confluent.io.INVALID> > > >>>>>>> Date: Friday, 7 June 2024 at 16:13 > > >>>>>>> To: dev@flink.apache.org <dev@flink.apache.org> > > >>>>>>> Subject: [EXTERNAL] [DISCUSS] FLIP-463: Schema Definition in > CREATE > > >>>> TABLE > > >>>>>>> AS Statement > > >>>>>>> HI All, > > >>>>>>> > > >>>>>>> I'd like to start a discussion on FLIP-463: Schema Definition in > > >>>> CREATE > > >>>>>>> TABLE AS Statement [1] > > >>>>>>> > > >>>>>>> The proposal extends the CTAS statement to allow users to define > > >> their > > >>>>>> own > > >>>>>>> schema by adding columns, primary and partition keys, and table > > >>>>>>> distribution to the CREATE statement. > > >>>>>>> > > >>>>>>> Any thoughts are welcome. > > >>>>>>> > > >>>>>>> Thanks, > > >>>>>>> - Sergio Pena > > >>>>>>> > > >>>>>>> [1] > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement > > >>>>>>> > > >>>>>>> Unless otherwise stated above: > > >>>>>>> > > >>>>>>> IBM United Kingdom Limited > > >>>>>>> Registered in England and Wales with number 741598 > > >>>>>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. > PO6 > > >>>> 3AU > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>>> > > >> > > > > > >