Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS Statement

Timo Walther Mon, 10 Jun 2024 22:59:26 -0700

Hi Sergio,

thanks for proposing this FLIP for finalizing the CTAS statement.Adopting the logic from MySQL for deriving and potentially overwritingparts of the schema should be easy to understand for everyone. So +1 forthe FLIP in general.


> How do you handle CTAS statements with SELECT clauses that have
> (implicit or explicit) NULLABLE or NOT NULLABLE columns?

@Jeyhun: I don't think there is anything special about this. The currentCTAS implementation should already cover that. It takes the nullabilityof the column's data type as a constraint into derived schema. Keep inmind that nullability is part of the data type in Flink, not only aconstraint on the schema. This decision was made due to Calcite internals.


> redefining the column types might be redundant, especially
> when users dont change the column type

This is indeed a good point. On one hand, I think we should avoidfurther complicating the syntax. But looking at other vendors [1] thisseems indeed a valid use case. If it doesn't cause too many specialcases in the parser (and it's look-ahead), I'm fine with supporting alist of column names as well. However, the most important use case willbe specifying a watermark, metadata columns, or other schema parts thatare not just columns names.


Regards,
Timo

[1]https://learn.microsoft.com/en-us/sql/t-sql/statements/create-table-as-select-azure-sql-data-warehouse?view=azure-sqldw-latest



On 10.06.24 21:37, Jeyhun Karimov wrote:

Hi Sergio,

Thanks for driving this FLIP. +1 for it.
I have a few questions:

- How do you handle CTAS statements with SELECT clauses that have (implicit
or explicit) NULLABLE or NOT NULLABLE columns? Could you also mention the
casting rules in the FLIP for this case?
- In some cases, redefining the column types might be redundant, especially
when users dont change the column type. For example, a user just wants to
change the column name from the SELECT clause.
Should we also support this scenario, similar to the postgres [1] ?

Regards,
Jeyhun


[1] https://www.postgresql.org/docs/8.1/sql-createtableas.html

On Mon, Jun 10, 2024 at 6:28 PM Sergio Pena <[email protected]>
wrote:

Hi David,

The CTAS feature is already part of Flink (proposed in FLIP-218 [1]). The
new FLIP-463 is just to extend the CTAS syntax to allow for adding new
columns to
the created table that are not part of the generated schema. I think
FLIP-218 [1] was discussed in the mail list somewhere, but I couldn't find
the discussion thread.
I was hoping it could contain the answers for your questions as that's
where CTAS was implemented. There's a user doc [2] for it that may help
too.

But, in a nutshell, CTAS is similar to running CREATE and then
INSERT/SELECT statements. The CTAS will execute a background job that
executes
the AS SELECT query and then inserts the data into the newly created table.
So the table will be kept up to date as the source data changes as you
said.

Generated columns are from the AS SELECT schema, right? Yes, that's already
allowed. FLIP-463 will also allow modifying the schema of the sink table
in case you need to add more columns.

I notice amazon [3] talks of the difference between view and CTAS; that

CTAS persists the content. Is this the approach we are taking? If so

where

are we persisting?


This is already supported by FLIP-218 [1]. But yes, the CTAS persists the
content in the new table you specify in the CTAS statement.

is the table read-only, or can we insert/ delete / update into it. If it
is read only how will the inserts , deletes updates fail.


Are there any restrictions on the select ? Can the select be a join,

aggregate, windowed?


I notice Azure [2] supports ISNULL. Is there a thought to change the

nullability for the CTAS?


Amazon [4] ignores ordered by clauses ? is that the same for this proposal?

I suppose all of this is already supported by FLIP-218. I don't see any
restrictions on the query. You can learn more about it in the FLIP-218 [1]

- Sergio

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199541185
[2]

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#as-select_statement

On Mon, Jun 10, 2024 at 3:10 AM David Radley <[email protected]>
wrote:

Hi Sergio,
Sounds good . I am relatively new to this area and had some basic
questions:

I notice in [1] it talks of materialized views. And CREATE view can
already take the AS keyword. It would be useful to me to understand when

we

would use each of these.

- I assume the table will be kept up to date as the source data changes
like a view.
- Are there any restrictions on the select ? Can the select be a join,
aggregate, windowed?
- I assume generated columns are allowed?
- is the table read-only, or can we insert/ delete / update into it. If

it

is read only how will the inserts , deletes updates fail.
- I notice Azure [2] supports ISNULL. Is there a thought to change the
nullability for the CTAS?
- I notice amazon [3] talks of the difference between view and CTAS; that
CTAS persists the content. Is this the approach we are taking? If so

where

are we persisting?
- Amazon [4] ignores ordered by clauses ? is that the same for this
proposal?

[1]

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/dynamic_tables/

[2]

https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-develop-ctas#selectinto-vs-ctas

[3]

https://docs.aws.amazon.com/athena/latest/ug/ctas-considerations-limitations.html#ctas-considerations-limitations-queries-vs-views

[4]

https://docs.aws.amazon.com/athena/latest/ug/ctas-considerations-limitations.html#ctas-considerations-limitations-order-by-ignored



Kind regards, David.

From: Sergio Pena <[email protected]>
Date: Friday, 7 June 2024 at 16:13
To: [email protected] <[email protected]>
Subject: [EXTERNAL] [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE
AS Statement
HI All,

I'd like to start a discussion on FLIP-463: Schema Definition in CREATE
TABLE AS Statement [1]

The proposal extends the CTAS statement to allow users to define their

own

schema by adding columns, primary and partition keys, and table
distribution to the CREATE statement.

Any thoughts are welcome.

Thanks,
- Sergio Pena

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement


Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS Statement

Reply via email to