Hi @fhue...@gmail.com @Timo Walther <twal...@apache.org> @Dawid Wysakowicz <dwysakow...@apache.org> What do you think we limit watermark must be defined on top-level column ?
if we do that, we can add an expression column to represent watermark like compute column, An example of all cases: create table MyTable ( f0 BIGINT NOT NULL, f1 ROW<q1 STRING, q2 TIMESTAMP(3)>, f2 VARCHAR<256>, f3 AS f0 + 1, f4 TIMESTAMP(3) NOT NULL, PRIMARY KEY (f0), UNIQUE (f3, f2), WATERMARK f4 AS f4 - INTERVAL '3' SECOND ) with (...) +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+ | name | type | null | key | compute column | watermark | +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+ | f0 | BIGINT | false | PRI | (NULL) | (NULL) | +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+ | f1 | ROW<q1 STRING, q2 TIMESTAMP(3)> | true | (NULL) | (NULL) | (NULL) | +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+ | f2 | VARCHAR<256> | true | UNQ | (NULL) | (NULL) | +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+ | f3 | BIGINT | false | UNQ | f0 + 1 | (NULL) | +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+ | f4 | TIMESTAMP(3) | false | (NULL) | (NULL) | f4 - INTERVAL '3' SECOND | +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+ WDYT ? Best, Godfrey godfrey he <godfre...@gmail.com> 于2020年4月30日周四 下午11:57写道: > Hi Fabian, > > the broken example is: > > create table MyTable ( > > f0 BIGINT NOT NULL, > > f1 ROW<q1 STRING, q2 TIMESTAMP(3)>, > > f2 VARCHAR<256>, > > f3 AS f0 + 1, > > PRIMARY KEY (f0), > > UNIQUE (f3, f2), > > WATERMARK f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND) > > ) with (...) > > > name > > type > > key > > compute column > > watermark > > f0 > > BIGINT NOT NULL > > PRI > > (NULL) > > f1 > > ROW<`q1` STRING, `q2` TIMESTAMP(3)> > > UNQ > > (NULL) > > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND) > > f2 > > VARCHAR<256> > > (NULL) > > NULL > > f3 > > BIGINT NOT NULL > > UNQ > > f0 + 1 > > > or we add a column to represent nullability. > > name > > type > > null > > key > > compute column > > watermark > > f0 > > BIGINT > > false > > PRI > > (NULL) > > f1 > > ROW<`q1` STRING, `q2` TIMESTAMP(3)> > > true > > UNQ > > (NULL) > > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND) > > f2 > > VARCHAR<256> > > true > > (NULL) > > NULL > > f3 > > BIGINT > > false > > UNQ > > f0 + 1 > > > > > Hi Jark, > If we can limit watermark must be defined on top-level column, > this will become more simple. > > Best, > Godfrey > > Jark Wu <imj...@gmail.com> 于2020年4月30日周四 下午11:38写道: > >> Hi, >> >> I'm in favor of Fabian's proposal. >> First, watermark is not a column, but a metadata just like primary key, so >> shouldn't stand with columns. >> Second, AFAIK, primary key can only be defined on top-level columns. >> Third, I think watermark can also follow primary key than only allow to >> define on top-level columns. >> >> I have to admit that in FLIP-66, watermark can define on nested fields. >> However, during implementation, I found that it's too complicated to do >> that. We have refactor time-based physical nodes, >> we have to use code generation to access event-time, we have to refactor >> FlinkTypeFactory to support a complex nested rowtime. >> There is not much value of this feature, but introduce a lot of complexity >> in code base. >> So I think we can force watermark define on top-level columns. If user >> want >> to define on nested columns, >> he/she can use computed column to be a top-level column. >> >> Best, >> Jark >> >> >> On Thu, 30 Apr 2020 at 17:55, Fabian Hueske <fhue...@gmail.com> wrote: >> >> > Hi Godfrey, >> > >> > The formatting of your example seems to be broken. >> > Could you send them again please? >> > >> > Regarding your points >> > > because watermark express can be a sub-column, just like `f1.q2` in >> above >> > example I give. >> > >> > I would put the watermark information in the row of the top-level field >> and >> > indicate to which nested field the watermark refers. >> > Don't we have to solve the same issue for primary keys that are defined >> on >> > a nested field? >> > >> > > A boolean flag can't represent such info. and I do know whether we >> will >> > support complex watermark expression involving multiple columns in the >> > future. such as: "WATERMARK FOR ts as ts + f1 + interval '1' second" >> > >> > You are right, a simple binary flag is definitely not sufficient to >> display >> > the watermark information. >> > I would put the expression string into the field, i.e., "ts + f1 + >> interval >> > '1' second" >> > >> > >> > For me the most important point of why to not show the watermark as a >> row >> > in the table is that it is not field that can be queried but meta >> > information on an existing field. >> > For the user it is important to know that a certain field has a >> watermark. >> > Otherwise, certain queries cannot be correctly specified. >> > Also there might be support for multiple watermarks that are defined of >> > different fields at some point. Would those be printed in multiple rows? >> > >> > Best, >> > Fabian >> > >> > >> > Am Do., 30. Apr. 2020 um 11:25 Uhr schrieb godfrey he < >> godfre...@gmail.com >> > >: >> > >> > > Hi Fabian, Aljoscha >> > > >> > > Thanks for the feedback. >> > > >> > > Agree with you that we can deal with primary key as you mentioned. >> > > now, the type column has contained the nullability attribute, e.g. >> BIGINT >> > > NOT NULL. >> > > (I'm also ok that we use two columns to represent type just like >> mysql) >> > > >> > > >Why I treat `watermark` as a special row ? >> > > because watermark express can be a sub-column, just like `f1.q2` in >> above >> > > example I give. >> > > A boolean flag can't represent such info. and I do know whether we >> will >> > > support complex >> > > watermark expression involving multiple columns in the future. such >> as: >> > > "WATERMARK FOR ts as ts + f1 + interval '1' second" >> > > >> > > If we do not support complex watermark expression, we can add a >> watermark >> > > column. >> > > >> > > for example: >> > > >> > > create table MyTable ( >> > > >> > > f0 BIGINT NOT NULL, >> > > >> > > f1 ROW<q1 STRING, q2 TIMESTAMP(3)>, >> > > >> > > f2 VARCHAR<256>, >> > > >> > > f3 AS f0 + 1, >> > > >> > > PRIMARY KEY (f0), >> > > >> > > UNIQUE (f3, f2), >> > > >> > > WATERMARK f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND) >> > > >> > > ) with (...) >> > > >> > > >> > > name >> > > >> > > type >> > > >> > > key >> > > >> > > compute column >> > > >> > > watermark >> > > >> > > f0 >> > > >> > > BIGINT NOT NULL >> > > >> > > PRI >> > > >> > > (NULL) >> > > >> > > f1 >> > > >> > > ROW<`q1` STRING, `q2` TIMESTAMP(3)> >> > > >> > > UNQ >> > > >> > > (NULL) >> > > >> > > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND) >> > > >> > > f2 >> > > >> > > VARCHAR<256> >> > > >> > > (NULL) >> > > >> > > NULL >> > > >> > > f3 >> > > >> > > BIGINT NOT NULL >> > > >> > > UNQ >> > > >> > > f0 + 1 >> > > >> > > >> > > or we add a column to represent nullability. >> > > >> > > name >> > > >> > > type >> > > >> > > null >> > > >> > > key >> > > >> > > compute column >> > > >> > > watermark >> > > >> > > f0 >> > > >> > > BIGINT >> > > >> > > false >> > > >> > > PRI >> > > >> > > (NULL) >> > > >> > > f1 >> > > >> > > ROW<`q1` STRING, `q2` TIMESTAMP(3)> >> > > >> > > true >> > > >> > > UNQ >> > > >> > > (NULL) >> > > >> > > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND) >> > > >> > > f2 >> > > >> > > VARCHAR<256> >> > > >> > > true >> > > >> > > (NULL) >> > > >> > > NULL >> > > >> > > f3 >> > > >> > > BIGINT >> > > >> > > false >> > > >> > > UNQ >> > > >> > > f0 + 1 >> > > >> > > >> > > Personally, I like the second one. (we need do some changes on >> > LogicalType >> > > to get type name without nullability) >> > > >> > > >> > > Best, >> > > Godfrey >> > > >> > > >> > > Aljoscha Krettek <aljos...@apache.org> 于2020年4月29日周三 下午5:47写道: >> > > >> > > > +1 I like the general idea of printing the results as a table. >> > > > >> > > > On the specifics I don't know enough but Fabians suggestions seems >> to >> > > > make sense to me. >> > > > >> > > > Aljoscha >> > > > >> > > > On 29.04.20 10:56, Fabian Hueske wrote: >> > > > > Hi Godfrey, >> > > > > >> > > > > Thanks for starting this discussion! >> > > > > >> > > > > In my mind, WATERMARK is a property (or constraint) of a field, >> just >> > > like >> > > > > PRIMARY KEY. >> > > > > Take this example from MySQL: >> > > > > >> > > > > mysql> CREATE TABLE people (id INT NOT NULL, name VARCHAR(128) NOT >> > > NULL, >> > > > > age INT, PRIMARY KEY (id)); >> > > > > Query OK, 0 rows affected (0.06 sec) >> > > > > >> > > > > mysql> describe people; >> > > > > +-------+--------------+------+-----+---------+-------+ >> > > > > | Field | Type | Null | Key | Default | Extra | >> > > > > +-------+--------------+------+-----+---------+-------+ >> > > > > | id | int | NO | PRI | NULL | | >> > > > > | name | varchar(128) | NO | | NULL | | >> > > > > | age | int | YES | | NULL | | >> > > > > +-------+--------------+------+-----+---------+-------+ >> > > > > 3 rows in set (0.01 sec) >> > > > > >> > > > > Here, PRIMARY KEY is marked in the Key column of the id field. >> > > > > We could do the same for watermarks by adding a Watermark column. >> > > > > >> > > > > Best, Fabian >> > > > > >> > > > > >> > > > > Am Mi., 29. Apr. 2020 um 10:43 Uhr schrieb godfrey he < >> > > > godfre...@gmail.com>: >> > > > > >> > > > >> Hi everyone, >> > > > >> >> > > > >> I would like to bring up a discussion about the result type of >> > > describe >> > > > >> statement, >> > > > >> which is introduced in FLIP-84[1]. >> > > > >> In previous version, we define the result type of `describe` >> > statement >> > > > is a >> > > > >> single column as following >> > > > >> >> > > > >> Statement >> > > > >> >> > > > >> Result Schema >> > > > >> >> > > > >> Result Value >> > > > >> >> > > > >> Result Kind >> > > > >> >> > > > >> Examples >> > > > >> >> > > > >> DESCRIBE xx >> > > > >> >> > > > >> field name: result >> > > > >> >> > > > >> field type: VARCHAR(n) >> > > > >> >> > > > >> (n is the max length of values) >> > > > >> >> > > > >> describe the detail of an object >> > > > >> >> > > > >> (single row) >> > > > >> >> > > > >> SUCCESS_WITH_CONTENT >> > > > >> >> > > > >> DESCRIBE table_name >> > > > >> >> > > > >> for "describe table_name", the result value is the `toString` >> value >> > of >> > > > >> `TableSchema`, which is an unstructured data. >> > > > >> It's hard to for user to use this info. >> > > > >> >> > > > >> for example: >> > > > >> >> > > > >> TableSchema schema = TableSchema.builder() >> > > > >> .field("f0", DataTypes.BIGINT()) >> > > > >> .field("f1", DataTypes.ROW( >> > > > >> DataTypes.FIELD("q1", DataTypes.STRING()), >> > > > >> DataTypes.FIELD("q2", DataTypes.TIMESTAMP(3)))) >> > > > >> .field("f2", DataTypes.STRING()) >> > > > >> .field("f3", DataTypes.BIGINT(), "f0 + 1") >> > > > >> .watermark("f1.q2", WATERMARK_EXPRESSION, WATERMARK_DATATYPE) >> > > > >> .build(); >> > > > >> >> > > > >> its `toString` value is: >> > > > >> root >> > > > >> |-- f0: BIGINT >> > > > >> |-- f1: ROW<`q1` STRING, `q2` TIMESTAMP(3)> >> > > > >> |-- f2: STRING >> > > > >> |-- f3: BIGINT AS f0 + 1 >> > > > >> |-- WATERMARK FOR f1.q2 AS now() >> > > > >> >> > > > >> For hive, MySQL, etc., the describe result is table form >> including >> > > field >> > > > >> names and field types. >> > > > >> which is more familiar with users. >> > > > >> TableSchema[2] has watermark expression and compute column, we >> > should >> > > > also >> > > > >> put them into the table: >> > > > >> for compute column, it's a column level, we add a new column >> named >> > > > `expr`. >> > > > >> for watermark expression, it's a table level, we add a special >> row >> > > > named >> > > > >> `WATERMARK` to represent it. >> > > > >> >> > > > >> The result will look like about above example: >> > > > >> >> > > > >> name >> > > > >> >> > > > >> type >> > > > >> >> > > > >> expr >> > > > >> >> > > > >> f0 >> > > > >> >> > > > >> BIGINT >> > > > >> >> > > > >> (NULL) >> > > > >> >> > > > >> f1 >> > > > >> >> > > > >> ROW<`q1` STRING, `q2` TIMESTAMP(3)> >> > > > >> >> > > > >> (NULL) >> > > > >> >> > > > >> f2 >> > > > >> >> > > > >> STRING >> > > > >> >> > > > >> NULL >> > > > >> >> > > > >> f3 >> > > > >> >> > > > >> BIGINT >> > > > >> >> > > > >> f0 + 1 >> > > > >> >> > > > >> WATERMARK >> > > > >> >> > > > >> (NULL) >> > > > >> >> > > > >> f1.q2 AS now() >> > > > >> >> > > > >> now there is a pr FLINK-17112 [3] to implement DESCRIBE >> statement. >> > > > >> >> > > > >> What do you think about this update? >> > > > >> Any feedback are welcome~ >> > > > >> >> > > > >> Best, >> > > > >> Godfrey >> > > > >> >> > > > >> [1] >> > > > >> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878 >> > > > >> [2] >> > > > >> >> > > > >> >> > > > >> > > >> > >> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/api/TableSchema.java >> > > > >> [3] https://github.com/apache/flink/pull/11892 >> > > > >> >> > > > >> >> > > > >> godfrey he <godfre...@gmail.com> 于2020年4月6日周一 下午10:38写道: >> > > > >> >> > > > >>> Hi Timo, >> > > > >>> >> > > > >>> Sorry for the late reply, and thanks for your correction. >> > > > >>> I missed DQL for job submission scenario. >> > > > >>> I'll fix the document right away. >> > > > >>> >> > > > >>> Best, >> > > > >>> Godfrey >> > > > >>> >> > > > >>> Timo Walther <twal...@apache.org> 于2020年4月3日周五 下午9:53写道: >> > > > >>> >> > > > >>>> Hi Godfrey, >> > > > >>>> >> > > > >>>> I'm sorry to jump in again but I still need to clarify some >> things >> > > > >>>> around TableResult. >> > > > >>>> >> > > > >>>> The FLIP says: >> > > > >>>> "For DML, this method returns TableResult until the job is >> > > submitted. >> > > > >>>> For other statements, TableResult is returned until the >> execution >> > is >> > > > >>>> finished." >> > > > >>>> >> > > > >>>> I thought we agreed on making every execution async? This also >> > means >> > > > >>>> returning a TableResult for DQLs even though the execution is >> not >> > > done >> > > > >>>> yet. People need access to the JobClient also for batch jobs in >> > > order >> > > > to >> > > > >>>> cancel long lasting queries. If people want to wait for the >> > > completion >> > > > >>>> they can hook into JobClient or collect(). >> > > > >>>> >> > > > >>>> Can we rephrase this part to: >> > > > >>>> >> > > > >>>> The FLIP says: >> > > > >>>> "For DML and DQL, this method returns TableResult once the job >> has >> > > > been >> > > > >>>> submitted. For DDL and DCL statements, TableResult is returned >> > once >> > > > the >> > > > >>>> operation has finished." >> > > > >>>> >> > > > >>>> Regards, >> > > > >>>> Timo >> > > > >>>> >> > > > >>>> >> > > > >>>> On 02.04.20 05:27, godfrey he wrote: >> > > > >>>>> Hi Aljoscha, Dawid, Timo, >> > > > >>>>> >> > > > >>>>> Thanks so much for the detailed explanation. >> > > > >>>>> Agree with you that the multiline story is not completed now, >> and >> > > we >> > > > >> can >> > > > >>>>> keep discussion. >> > > > >>>>> I will add current discussions and conclusions to the FLIP. >> > > > >>>>> >> > > > >>>>> Best, >> > > > >>>>> Godfrey >> > > > >>>>> >> > > > >>>>> >> > > > >>>>> >> > > > >>>>> Timo Walther <twal...@apache.org> 于2020年4月1日周三 下午11:27写道: >> > > > >>>>> >> > > > >>>>>> Hi Godfrey, >> > > > >>>>>> >> > > > >>>>>> first of all, I agree with Dawid. The multiline story is not >> > > > >> completed >> > > > >>>>>> by this FLIP. It just verifies the big picture. >> > > > >>>>>> >> > > > >>>>>> 1. "control the execution logic through the proposed method >> if >> > > they >> > > > >>>> know >> > > > >>>>>> what the statements are" >> > > > >>>>>> >> > > > >>>>>> This is a good point that also Fabian raised in the linked >> > google >> > > > >> doc. >> > > > >>>> I >> > > > >>>>>> could also imagine to return a more complicated POJO when >> > calling >> > > > >>>>>> `executeMultiSql()`. >> > > > >>>>>> >> > > > >>>>>> The POJO would include some `getSqlProperties()` such that a >> > > > platform >> > > > >>>>>> gets insights into the query before executing. We could also >> > > trigger >> > > > >>>> the >> > > > >>>>>> execution more explicitly instead of hiding it behind an >> > iterator. >> > > > >>>>>> >> > > > >>>>>> 2. "there are some special commands introduced in SQL client" >> > > > >>>>>> >> > > > >>>>>> For platforms and SQL Client specific commands, we could >> offer a >> > > > hook >> > > > >>>> to >> > > > >>>>>> the parser or a fallback parser in case the regular table >> > > > environment >> > > > >>>>>> parser cannot deal with the statement. >> > > > >>>>>> >> > > > >>>>>> However, all of that is future work and can be discussed in a >> > > > >> separate >> > > > >>>>>> FLIP. >> > > > >>>>>> >> > > > >>>>>> 3. +1 for the `Iterator` instead of `Iterable`. >> > > > >>>>>> >> > > > >>>>>> 4. "we should convert the checked exception to unchecked >> > > exception" >> > > > >>>>>> >> > > > >>>>>> Yes, I meant using a runtime exception instead of a checked >> > > > >> exception. >> > > > >>>>>> There was no consensus on putting the exception into the >> > > > >> `TableResult`. >> > > > >>>>>> >> > > > >>>>>> Regards, >> > > > >>>>>> Timo >> > > > >>>>>> >> > > > >>>>>> On 01.04.20 15:35, Dawid Wysakowicz wrote: >> > > > >>>>>>> When considering the multi-line support I think it is >> helpful >> > to >> > > > >> start >> > > > >>>>>>> with a use case in mind. In my opinion consumers of this >> method >> > > > will >> > > > >>>> be: >> > > > >>>>>>> >> > > > >>>>>>> 1. sql-client >> > > > >>>>>>> 2. third-part sql based platforms >> > > > >>>>>>> >> > > > >>>>>>> @Godfrey As for the quit/source/... commands. I think those >> > > belong >> > > > >> to >> > > > >>>>>>> the responsibility of aforementioned. I think they should >> not >> > be >> > > > >>>>>>> understandable by the TableEnvironment. What would quit on a >> > > > >>>>>>> TableEnvironment do? Moreover I think such commands should >> be >> > > > >> prefixed >> > > > >>>>>>> appropriately. I think it's a common practice to e.g. prefix >> > > those >> > > > >>>> with >> > > > >>>>>>> ! or : to say they are meta commands of the tool rather >> than a >> > > > >> query. >> > > > >>>>>>> >> > > > >>>>>>> I also don't necessarily understand why platform users need >> to >> > > know >> > > > >>>> the >> > > > >>>>>>> kind of the query to use the proposed method. They should >> get >> > the >> > > > >> type >> > > > >>>>>>> from the TableResult#ResultKind. If the ResultKind is >> SUCCESS, >> > it >> > > > >> was >> > > > >>>> a >> > > > >>>>>>> DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If that's >> > not >> > > > >>>> enough >> > > > >>>>>>> we can enrich the TableResult with more explicit kind of >> query, >> > > but >> > > > >> so >> > > > >>>>>>> far I don't see such a need. >> > > > >>>>>>> >> > > > >>>>>>> @Kurt In those cases I would assume the developers want to >> > > present >> > > > >>>>>>> results of the queries anyway. Moreover I think it is safe >> to >> > > > assume >> > > > >>>>>>> they can adhere to such a contract that the results must be >> > > > >> iterated. >> > > > >>>>>>> >> > > > >>>>>>> For direct users of TableEnvironment/Table API this method >> does >> > > not >> > > > >>>> make >> > > > >>>>>>> much sense anyway, in my opinion. I think we can rather >> safely >> > > > >> assume >> > > > >>>> in >> > > > >>>>>>> this scenario they do not want to submit multiple queries >> at a >> > > > >> single >> > > > >>>>>> time. >> > > > >>>>>>> >> > > > >>>>>>> Best, >> > > > >>>>>>> >> > > > >>>>>>> Dawid >> > > > >>>>>>> >> > > > >>>>>>> >> > > > >>>>>>> On 01/04/2020 15:07, Kurt Young wrote: >> > > > >>>>>>>> One comment to `executeMultilineSql`, I'm afraid sometimes >> > user >> > > > >> might >> > > > >>>>>>>> forget to >> > > > >>>>>>>> iterate the returned iterators, e.g. user submits a bunch >> of >> > > DDLs >> > > > >> and >> > > > >>>>>>>> expect the >> > > > >>>>>>>> framework will execute them one by one. But it didn't. >> > > > >>>>>>>> >> > > > >>>>>>>> Best, >> > > > >>>>>>>> Kurt >> > > > >>>>>>>> >> > > > >>>>>>>> >> > > > >>>>>>>> On Wed, Apr 1, 2020 at 5:10 PM Aljoscha Krettek< >> > > > >> aljos...@apache.org> >> > > > >>>>>> wrote: >> > > > >>>>>>>> >> > > > >>>>>>>>> Agreed to what Dawid and Timo said. >> > > > >>>>>>>>> >> > > > >>>>>>>>> To answer your question about multi line SQL: no, we don't >> > > think >> > > > >> we >> > > > >>>>>> need >> > > > >>>>>>>>> this in Flink 1.11, we only wanted to make sure that the >> > > > >> interfaces >> > > > >>>>>> that >> > > > >>>>>>>>> we now put in place will potentially allow this in the >> > future. >> > > > >>>>>>>>> >> > > > >>>>>>>>> Best, >> > > > >>>>>>>>> Aljoscha >> > > > >>>>>>>>> >> > > > >>>>>>>>> On 01.04.20 09:31, godfrey he wrote: >> > > > >>>>>>>>>> Hi, Timo & Dawid, >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> Thanks so much for the effort of `multiline statements >> > > > >> supporting`, >> > > > >>>>>>>>>> I have a few questions about this method: >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> 1. users can well control the execution logic through the >> > > > >> proposed >> > > > >>>>>> method >> > > > >>>>>>>>>> if they know what the statements are (a statement >> is a >> > > > DDL, a >> > > > >>>> DML >> > > > >>>>>> or >> > > > >>>>>>>>>> others). >> > > > >>>>>>>>>> but if a statement is from a file, that means users do >> not >> > > know >> > > > >>>> what >> > > > >>>>>> the >> > > > >>>>>>>>>> statements are, >> > > > >>>>>>>>>> the execution behavior is unclear. >> > > > >>>>>>>>>> As a platform user, I think this method is hard to use, >> > unless >> > > > >> the >> > > > >>>>>>>>> platform >> > > > >>>>>>>>>> defines >> > > > >>>>>>>>>> a set of rule about the statements order, such as: no >> select >> > > in >> > > > >> the >> > > > >>>>>>>>> middle, >> > > > >>>>>>>>>> dml must be at tail of sql file (which may be the most >> case >> > in >> > > > >>>> product >> > > > >>>>>>>>>> env). >> > > > >>>>>>>>>> Otherwise the platform must parse the sql first, then >> know >> > > what >> > > > >> the >> > > > >>>>>>>>>> statements are. >> > > > >>>>>>>>>> If do like that, the platform can handle all cases >> through >> > > > >>>>>> `executeSql` >> > > > >>>>>>>>> and >> > > > >>>>>>>>>> `StatementSet`. >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> 2. SQL client can't also use `executeMultilineSql` to >> > supports >> > > > >>>>>> multiline >> > > > >>>>>>>>>> statements, >> > > > >>>>>>>>>> because there are some special commands introduced >> in >> > SQL >> > > > >>>> client, >> > > > >>>>>>>>>> such as `quit`, `source`, `load jar` (not exist now, but >> > maybe >> > > > we >> > > > >>>> need >> > > > >>>>>>>>> this >> > > > >>>>>>>>>> command >> > > > >>>>>>>>>> to support dynamic table source and udf). >> > > > >>>>>>>>>> Does TableEnvironment also supports those commands? >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> 3. btw, we must have this feature in release-1.11? I find >> > > there >> > > > >> are >> > > > >>>>>> few >> > > > >>>>>>>>>> user cases >> > > > >>>>>>>>>> in the feedback document which behavior is unclear >> now. >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> regarding to "change the return value from >> `Iterable<Row` to >> > > > >>>>>>>>>> `Iterator<Row`", >> > > > >>>>>>>>>> I couldn't agree more with this change. Just as Dawid >> > > mentioned >> > > > >>>>>>>>>> "The contract of the Iterable#iterator is that it >> returns a >> > > new >> > > > >>>>>> iterator >> > > > >>>>>>>>>> each time, >> > > > >>>>>>>>>> which effectively means we can iterate the results >> > > multiple >> > > > >>>>>> times.", >> > > > >>>>>>>>>> we does not provide iterate the results multiple times. >> > > > >>>>>>>>>> If we want do that, the client must buffer all results. >> but >> > > it's >> > > > >>>>>>>>> impossible >> > > > >>>>>>>>>> for streaming job. >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> Best, >> > > > >>>>>>>>>> Godfrey >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> Dawid Wysakowicz<dwysakow...@apache.org> 于2020年4月1日周三 >> > > > 上午3:14写道: >> > > > >>>>>>>>>> >> > > > >>>>>>>>>>> Thank you Timo for the great summary! It covers (almost) >> > all >> > > > the >> > > > >>>>>> topics. >> > > > >>>>>>>>>>> Even though in the end we are not suggesting much >> changes >> > to >> > > > the >> > > > >>>>>> current >> > > > >>>>>>>>>>> state of FLIP I think it is important to lay out all >> > possible >> > > > >> use >> > > > >>>>>> cases >> > > > >>>>>>>>>>> so that we do not change the execution model every >> release. >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>>> There is one additional thing we discussed. Could we >> change >> > > the >> > > > >>>>>> result >> > > > >>>>>>>>>>> type of TableResult#collect to Iterator<Row>? Even >> though >> > > those >> > > > >>>>>>>>>>> interfaces do not differ much. I think Iterator better >> > > > describes >> > > > >>>> that >> > > > >>>>>>>>>>> the results might not be materialized on the client >> side, >> > but >> > > > >> can >> > > > >>>> be >> > > > >>>>>>>>>>> retrieved on a per record basis. The contract of the >> > > > >>>>>> Iterable#iterator >> > > > >>>>>>>>>>> is that it returns a new iterator each time, which >> > > effectively >> > > > >>>> means >> > > > >>>>>> we >> > > > >>>>>>>>>>> can iterate the results multiple times. Iterating the >> > results >> > > > is >> > > > >>>> not >> > > > >>>>>>>>>>> possible when we don't retrieve all the results from the >> > > > cluster >> > > > >>>> at >> > > > >>>>>>>>> once. >> > > > >>>>>>>>>>> I think we should also use Iterator for >> > > > >>>>>>>>>>> TableEnvironment#executeMultilineSql(String statements): >> > > > >>>>>>>>>>> Iterator<TableResult>. >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>>> Best, >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>>> Dawid >> > > > >>>>>>>>>>> >> > > > >>>>>>>>>>> On 31/03/2020 19:27, Timo Walther wrote: >> > > > >>>>>>>>>>>> Hi Godfrey, >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> Aljoscha, Dawid, Klou, and I had another discussion >> around >> > > > >>>> FLIP-84. >> > > > >>>>>> In >> > > > >>>>>>>>>>>> particular, we discussed how the current status of the >> > FLIP >> > > > and >> > > > >>>> the >> > > > >>>>>>>>>>>> future requirements around multiline statements, >> > async/sync, >> > > > >>>>>> collect() >> > > > >>>>>>>>>>>> fit together. >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> We also updated the FLIP-84 Feedback Summary document >> [1] >> > > with >> > > > >>>> some >> > > > >>>>>>>>>>>> use cases. >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> We believe that we found a good solution that also >> fits to >> > > > what >> > > > >>>> is >> > > > >>>>>> in >> > > > >>>>>>>>>>>> the current FLIP. So no bigger changes necessary, >> which is >> > > > >> great! >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> Our findings were: >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> 1. Async vs sync submission of Flink jobs: >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> Having a blocking `execute()` in DataStream API was >> > rather a >> > > > >>>>>> mistake. >> > > > >>>>>>>>>>>> Instead all submissions should be async because this >> > allows >> > > > >>>>>> supporting >> > > > >>>>>>>>>>>> both modes if necessary. Thus, submitting all queries >> > async >> > > > >>>> sounds >> > > > >>>>>>>>>>>> good to us. If users want to run a job sync, they can >> use >> > > the >> > > > >>>>>>>>>>>> JobClient and wait for completion (or collect() in >> case of >> > > > >> batch >> > > > >>>>>> jobs). >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> 2. Multi-statement execution: >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> For the multi-statement execution, we don't see a >> > > > >> contradication >> > > > >>>>>> with >> > > > >>>>>>>>>>>> the async execution behavior. We imagine a method like: >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> TableEnvironment#executeMultilineSql(String >> statements): >> > > > >>>>>>>>>>>> Iterable<TableResult> >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> Where the `Iterator#next()` method would trigger the >> next >> > > > >>>> statement >> > > > >>>>>>>>>>>> submission. This allows a caller to decide >> synchronously >> > > when >> > > > >> to >> > > > >>>>>>>>>>>> submit statements async to the cluster. Thus, a service >> > such >> > > > as >> > > > >>>> the >> > > > >>>>>>>>>>>> SQL Client can handle the result of each statement >> > > > individually >> > > > >>>> and >> > > > >>>>>>>>>>>> process statement by statement sequentially. >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> 3. The role of TableResult and result retrieval in >> general >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> `TableResult` is similar to `JobClient`. Instead of >> > > returning >> > > > a >> > > > >>>>>>>>>>>> `CompletableFuture` of something, it is a concrete util >> > > class >> > > > >>>> where >> > > > >>>>>>>>>>>> some methods have the behavior of completable future >> (e.g. >> > > > >>>>>> collect(), >> > > > >>>>>>>>>>>> print()) and some are already completed >> (getTableSchema(), >> > > > >>>>>>>>>>>> getResultKind()). >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> `StatementSet#execute()` returns a single `TableResult` >> > > > because >> > > > >>>> the >> > > > >>>>>>>>>>>> order is undefined in a set and all statements have the >> > same >> > > > >>>> schema. >> > > > >>>>>>>>>>>> Its `collect()` will return a row for each executed >> > `INSERT >> > > > >>>> INTO` in >> > > > >>>>>>>>>>>> the order of statement definition. >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> For simple `SELECT * FROM ...`, the query execution >> might >> > > > block >> > > > >>>>>> until >> > > > >>>>>>>>>>>> `collect()` is called to pull buffered rows from the >> job >> > > (from >> > > > >>>>>>>>>>>> socket/REST API what ever we will use in the future). >> We >> > can >> > > > >> say >> > > > >>>>>> that >> > > > >>>>>>>>>>>> a statement finished successfully, when the >> > > > >>>>>> `collect#Iterator#hasNext` >> > > > >>>>>>>>>>>> has returned false. >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> I hope this summarizes our discussion >> > @Dawid/Aljoscha/Klou? >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> It would be great if we can add these findings to the >> FLIP >> > > > >>>> before we >> > > > >>>>>>>>>>>> start voting. >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> One minor thing: some `execute()` methods still throw a >> > > > checked >> > > > >>>>>>>>>>>> exception; can we remove that from the FLIP? Also the >> > above >> > > > >>>>>> mentioned >> > > > >>>>>>>>>>>> `Iterator#next()` would trigger an execution without >> > > throwing >> > > > a >> > > > >>>>>>>>>>>> checked exception. >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> Thanks, >> > > > >>>>>>>>>>>> Timo >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>>>>> [1] >> > > > >>>>>>>>>>>> >> > > > >>>>>>>>> >> > > > >>>>>> >> > > > >>>> >> > > > >> >> > > > >> > > >> > >> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit# >> > > > >>>>>>>>>>>> On 31.03.20 06:28, godfrey he wrote: >> > > > >>>>>>>>>>>>> Hi, Timo & Jark >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>>> Thanks for your explanation. >> > > > >>>>>>>>>>>>> Agree with you that async execution should always be >> > async, >> > > > >>>>>>>>>>>>> and sync execution scenario can be covered by async >> > > > >> execution. >> > > > >>>>>>>>>>>>> It helps provide an unified entry point for batch and >> > > > >> streaming. >> > > > >>>>>>>>>>>>> I think we can also use sync execution for some >> testing. >> > > > >>>>>>>>>>>>> So, I agree with you that we provide `executeSql` >> method >> > > and >> > > > >>>> it's >> > > > >>>>>>>>> async >> > > > >>>>>>>>>>>>> method. >> > > > >>>>>>>>>>>>> If we want sync method in the future, we can add >> method >> > > named >> > > > >>>>>>>>>>>>> `executeSqlSync`. >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>>> I think we've reached an agreement. I will update the >> > > > >> document, >> > > > >>>> and >> > > > >>>>>>>>>>>>> start >> > > > >>>>>>>>>>>>> voting process. >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>>> Best, >> > > > >>>>>>>>>>>>> Godfrey >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>>> Jark Wu<imj...@gmail.com> 于2020年3月31日周二 上午12:46写道: >> > > > >>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>> Hi, >> > > > >>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>> I didn't follow the full discussion. >> > > > >>>>>>>>>>>>>> But I share the same concern with Timo that streaming >> > > > queries >> > > > >>>>>> should >> > > > >>>>>>>>>>>>>> always >> > > > >>>>>>>>>>>>>> be async. >> > > > >>>>>>>>>>>>>> Otherwise, I can image it will cause a lot of >> confusion >> > > and >> > > > >>>>>> problems >> > > > >>>>>>>>> if >> > > > >>>>>>>>>>>>>> users don't deeply keep the "sync" in mind (e.g. >> client >> > > > >> hangs). >> > > > >>>>>>>>>>>>>> Besides, the streaming mode is still the majority use >> > > cases >> > > > >> of >> > > > >>>>>> Flink >> > > > >>>>>>>>>>>>>> and >> > > > >>>>>>>>>>>>>> Flink SQL. We should put the usability at a high >> > priority. >> > > > >>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>> Best, >> > > > >>>>>>>>>>>>>> Jark >> > > > >>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>> On Mon, 30 Mar 2020 at 23:27, Timo Walther< >> > > > >> twal...@apache.org> >> > > > >>>>>>>>> wrote: >> > > > >>>>>>>>>>>>>>> Hi Godfrey, >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> maybe I wasn't expressing my biggest concern enough >> in >> > my >> > > > >> last >> > > > >>>>>> mail. >> > > > >>>>>>>>>>>>>>> Even in a singleline and sync execution, I think >> that >> > > > >>>> streaming >> > > > >>>>>>>>>>>>>>> queries >> > > > >>>>>>>>>>>>>>> should not block the execution. Otherwise it is not >> > > > possible >> > > > >>>> to >> > > > >>>>>> call >> > > > >>>>>>>>>>>>>>> collect() or print() on them afterwards. >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> "there are too many things need to discuss for >> > > multiline": >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> True, I don't want to solve all of them right now. >> But >> > > what >> > > > >> I >> > > > >>>>>> know >> > > > >>>>>>>>> is >> > > > >>>>>>>>>>>>>>> that our newly introduced methods should fit into a >> > > > >> multiline >> > > > >>>>>>>>>>>>>>> execution. >> > > > >>>>>>>>>>>>>>> There is no big difference of calling >> `executeSql(A), >> > > > >>>>>>>>>>>>>>> executeSql(B)` and >> > > > >>>>>>>>>>>>>>> processing a multiline file `A;\nB;`. >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> I think the example that you mentioned can simply be >> > > > >> undefined >> > > > >>>>>> for >> > > > >>>>>>>>>>>>>>> now. >> > > > >>>>>>>>>>>>>>> Currently, no catalog is modifying data but just >> > > metadata. >> > > > >>>> This >> > > > >>>>>> is a >> > > > >>>>>>>>>>>>>>> separate discussion. >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> "result of the second statement is indeterministic": >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> Sure this is indeterministic. But this is the >> > > implementers >> > > > >>>> fault >> > > > >>>>>>>>>>>>>>> and we >> > > > >>>>>>>>>>>>>>> cannot forbid such pipelines. >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> How about we always execute streaming queries >> async? It >> > > > >> would >> > > > >>>>>>>>> unblock >> > > > >>>>>>>>>>>>>>> executeSql() and multiline statements. >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> Having a `executeSqlAsync()` is useful for batch. >> > > However, >> > > > I >> > > > >>>>>> don't >> > > > >>>>>>>>>>>>>>> want >> > > > >>>>>>>>>>>>>>> `sync/async` be the new batch/stream flag. The >> > execution >> > > > >>>> behavior >> > > > >>>>>>>>>>>>>>> should >> > > > >>>>>>>>>>>>>>> come from the query itself. >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> Regards, >> > > > >>>>>>>>>>>>>>> Timo >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>> On 30.03.20 11:12, godfrey he wrote: >> > > > >>>>>>>>>>>>>>>> Hi Timo, >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> Agree with you that streaming queries is our top >> > > priority, >> > > > >>>>>>>>>>>>>>>> but I think there are too many things need to >> discuss >> > > for >> > > > >>>>>> multiline >> > > > >>>>>>>>>>>>>>>> statements: >> > > > >>>>>>>>>>>>>>>> e.g. >> > > > >>>>>>>>>>>>>>>> 1. what's the behaivor of DDL and DML mixing for >> async >> > > > >>>>>> execution: >> > > > >>>>>>>>>>>>>>>> create table t1 xxx; >> > > > >>>>>>>>>>>>>>>> create table t2 xxx; >> > > > >>>>>>>>>>>>>>>> insert into t2 select * from t1 where xxx; >> > > > >>>>>>>>>>>>>>>> drop table t1; // t1 may be a MySQL table, the data >> > will >> > > > >>>> also be >> > > > >>>>>>>>>>>>>> deleted. >> > > > >>>>>>>>>>>>>>>> t1 is dropped when "insert" job is running. >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> 2. what's the behaivor of unified scenario for >> async >> > > > >>>> execution: >> > > > >>>>>>>>>>>>>>>> (as you >> > > > >>>>>>>>>>>>>>>> mentioned) >> > > > >>>>>>>>>>>>>>>> INSERT INTO t1 SELECT * FROM s; >> > > > >>>>>>>>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM; >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> The result of the second statement is >> indeterministic, >> > > > >>>> because >> > > > >>>>>> the >> > > > >>>>>>>>>>>>>> first >> > > > >>>>>>>>>>>>>>>> statement maybe is running. >> > > > >>>>>>>>>>>>>>>> I think we need to put a lot of effort to define >> the >> > > > >>>> behavior of >> > > > >>>>>>>>>>>>>>> logically >> > > > >>>>>>>>>>>>>>>> related queries. >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> In this FLIP, I suggest we only handle single >> > statement, >> > > > >> and >> > > > >>>> we >> > > > >>>>>>>>> also >> > > > >>>>>>>>>>>>>>>> introduce an async execute method >> > > > >>>>>>>>>>>>>>>> which is more important and more often used for >> users. >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> Dor the sync methods (like >> > `TableEnvironment.executeSql` >> > > > >> and >> > > > >>>>>>>>>>>>>>>> `StatementSet.execute`), >> > > > >>>>>>>>>>>>>>>> the result will be returned until the job is >> finished. >> > > The >> > > > >>>>>>>>> following >> > > > >>>>>>>>>>>>>>>> methods will be introduced in this FLIP: >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> /** >> > > > >>>>>>>>>>>>>>>> * Asynchronously execute the given single >> > > > statement >> > > > >>>>>>>>>>>>>>>> */ >> > > > >>>>>>>>>>>>>>>> TableEnvironment.executeSqlAsync(String statement): >> > > > >>>> TableResult >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> /** >> > > > >>>>>>>>>>>>>>>> * Asynchronously execute the dml statements >> as >> > a >> > > > >> batch >> > > > >>>>>>>>>>>>>>>> */ >> > > > >>>>>>>>>>>>>>>> StatementSet.executeAsync(): TableResult >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> public interface TableResult { >> > > > >>>>>>>>>>>>>>>> /** >> > > > >>>>>>>>>>>>>>>> * return JobClient for DQL and DML in >> async >> > > > mode, >> > > > >>>> else >> > > > >>>>>>>>> return >> > > > >>>>>>>>>>>>>>>> Optional.empty >> > > > >>>>>>>>>>>>>>>> */ >> > > > >>>>>>>>>>>>>>>> Optional<JobClient> getJobClient(); >> > > > >>>>>>>>>>>>>>>> } >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> what do you think? >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> Best, >> > > > >>>>>>>>>>>>>>>> Godfrey >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>> Timo Walther<twal...@apache.org> 于2020年3月26日周四 >> > > 下午9:15写道: >> > > > >>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> Hi Godfrey, >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> executing streaming queries must be our top >> priority >> > > > >> because >> > > > >>>>>> this >> > > > >>>>>>>>> is >> > > > >>>>>>>>>>>>>>>>> what distinguishes Flink from competitors. If we >> > change >> > > > >> the >> > > > >>>>>>>>>>>>>>>>> execution >> > > > >>>>>>>>>>>>>>>>> behavior, we should think about the other cases as >> > well >> > > > to >> > > > >>>> not >> > > > >>>>>>>>> break >> > > > >>>>>>>>>>>>>> the >> > > > >>>>>>>>>>>>>>>>> API a third time. >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> I fear that just having an async execute method >> will >> > > not >> > > > >> be >> > > > >>>>>> enough >> > > > >>>>>>>>>>>>>>>>> because users should be able to mix streaming and >> > batch >> > > > >>>> queries >> > > > >>>>>>>>> in a >> > > > >>>>>>>>>>>>>>>>> unified scenario. >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> If I remember it correctly, we had some >> discussions >> > in >> > > > the >> > > > >>>> past >> > > > >>>>>>>>>>>>>>>>> about >> > > > >>>>>>>>>>>>>>>>> what decides about the execution mode of a query. >> > > > >>>> Currently, we >> > > > >>>>>>>>>>>>>>>>> would >> > > > >>>>>>>>>>>>>>>>> like to let the query decide, not derive it from >> the >> > > > >>>> sources. >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> So I could image a multiline pipeline as: >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> USE CATALOG 'mycat'; >> > > > >>>>>>>>>>>>>>>>> INSERT INTO t1 SELECT * FROM s; >> > > > >>>>>>>>>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT >> STREAM; >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> For executeMultilineSql(): >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> sync because regular SQL >> > > > >>>>>>>>>>>>>>>>> sync because regular Batch SQL >> > > > >>>>>>>>>>>>>>>>> async because Streaming SQL >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> For executeAsyncMultilineSql(): >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> async because everything should be async >> > > > >>>>>>>>>>>>>>>>> async because everything should be async >> > > > >>>>>>>>>>>>>>>>> async because everything should be async >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> What we should not start for >> > > executeAsyncMultilineSql(): >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> sync because DDL >> > > > >>>>>>>>>>>>>>>>> async because everything should be async >> > > > >>>>>>>>>>>>>>>>> async because everything should be async >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> What are you thoughts here? >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> Regards, >> > > > >>>>>>>>>>>>>>>>> Timo >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>> On 26.03.20 12:50, godfrey he wrote: >> > > > >>>>>>>>>>>>>>>>>> Hi Timo, >> > > > >>>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>>> I agree with you that streaming queries mostly >> need >> > > > async >> > > > >>>>>>>>>>>>>>>>>> execution. >> > > > >>>>>>>>>>>>>>>>>> In fact, our original plan is only introducing >> sync >> > > > >>>> methods in >> > > > >>>>>>>>> this >> > > > >>>>>>>>>>>>>>> FLIP, >> > > > >>>>>>>>>>>>>>>>>> and async methods (like "executeSqlAsync") will >> be >> > > > >>>> introduced >> > > > >>>>>> in >> > > > >>>>>>>>>>>>>>>>>> the >> > > > >>>>>>>>>>>>>>>>> future >> > > > >>>>>>>>>>>>>>>>>> which is mentioned in the appendix. >> > > > >>>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>>> Maybe the async methods also need to be >> considered >> > in >> > > > >> this >> > > > >>>>>> FLIP. >> > > > >>>>>>>>>>>>>>>>>> >> > > > >>>>>>>>>>>>>>>>>> I think sync methods is also useful for streaming >> > > which >> >