Hi Mang, Thanks for driving this! Finally there is a discussion about CTAS. It is one of the most important features.
I agree with Jark that it would be good to be able to take care of both atomicity and isolation, which lead to Spark DataSource v2. Would you like to help me understand the connection between the reason and your decision to pick Spark DataSource V1? *Reasons:* - *Streaming mode requires the table to be created first, downstream jobs can consume in real time. *(both Spark DataSource V1 and V2) - *In most cases, Streaming jobs do not need to be cleaned up even if the job fails. * (both Spark DataSource V1 and V2) - *Flink has a rich connector ecosystem, and the capabilities provided by external storage systems are different, Flink needs to behave consistently. *(how will it lead to Spark DataSource V1) - *Batch jobs try to ensure final atomicity. *(means to choose Spark DataSource V2 right?) I think there are some differences between Spark DataSource V1 and V2, e.g. when will the sind table be visible. Whether the result will be written to a temporary directory or directly to the sink table, etc. It would be great if you could update the reasons that led to your decision. thanks! Best regards, Jing On Tue, May 31, 2022 at 8:34 AM Yun Gao <yungao...@aliyun.com.invalid> wrote: > Hi, > > Regarding the drop operation, with some offline discussion with Dalong and > Zhu, > we think that listening in the client side might be problematic since it > would exit > after submitting the jobs in detached mode, thus the operation might need > to > be in the JobMaster side. > > For the listener interface, currently JobListener only resides in the > client side > and contains unsuitable methods like onJobSubmitted for this scenario, and > the internal JobStatusListener is designed to be used inside JM and is not > serializable, thus we tend to add a new interface JobStatusHook, > which could be attached to the JobGraph and executed in the JobMaster. > The interface will also be marked as Internal. > > Best, > Yun > > > ------------------------------------------------------------------ > From:Mang Zhang <zhangma...@163.com> > Send Time:2022 May 25 (Wed.) 10:24 > To:dev <dev@flink.apache.org> > Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE > TABLE(CTAS) > > Hi, Martijn > Thanks for your reply! > I looked at the SQL standard, CTAS is part of the SQL standard. > Feature T172 is "AS subquery clause in table definition". > > > > -- > > Best regards, > Mang Zhang > > > > > > At 2022-05-04 21:49:00, "Martijn Visser" <martijnvis...@apache.org> wrote: > >Hi everyone, > > > >Can we identify if this proposed syntax is part of the SQL standard? > > > >Best regards, > > > >Martijn Visser > >https://twitter.com/MartijnVisser82 > >https://github.com/MartijnVisser > > > > > >On Fri, 29 Apr 2022 at 11:19, yuxia <luoyu...@alumni.sjtu.edu.cn> wrote: > > > >> Thanks for for driving this work, it's to be a useful feature. > >> About the flip-218, I have some questions. > >> > >> 1: Does our CTAS syntax support specify target table's schema including > >> column name and data type? I think it maybe a useful fature in case we > want > >> to change the data types in target table instead of always copy the > source > >> table's schema. It'll be more flexible with this feature. > >> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this > feature. > >> > >> 2: Seems it'll requre sink to implement an public interface to drop > table, > >> so what's the interface will look like? > >> > >> [1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html > >> > >> Best regards, > >> Yuxia > >> > >> ----- 原始邮件 ----- > >> 发件人: "Mang Zhang" <zhangma...@163.com> > >> 收件人: "dev" <dev@flink.apache.org> > >> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24 > >> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS) > >> > >> Hi, everyone > >> > >> > >> I would like to open a discussion for support select clause in CREATE > >> TABLE(CTAS), > >> With the development of business and the enhancement of flink sql > >> capabilities, queries become more and more complex. > >> Now the user needs to use the Create Table statement to create the > target > >> table first, and then execute the insert statement. > >> However, the target table may have many columns, which will bring a lot > of > >> work outside the business logic to the user. > >> At the same time, ensure that the schema of the created target table is > >> consistent with the schema of the query result. > >> Using a CTAS syntax like Hive/Spark can greatly facilitate the user. > >> > >> > >> > >> You can find more details in FLIP-218[1]. Looking forward to your > feedback. > >> > >> > >> > >> [1] > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS) > >> > >> > >> > >> > >> -- > >> > >> Best regards, > >> Mang Zhang > >> > >