Hi, I'm in favor of Fabian's proposal. First, watermark is not a column, but a metadata just like primary key, so shouldn't stand with columns. Second, AFAIK, primary key can only be defined on top-level columns. Third, I think watermark can also follow primary key than only allow to define on top-level columns.
I have to admit that in FLIP-66, watermark can define on nested fields. However, during implementation, I found that it's too complicated to do that. We have refactor time-based physical nodes, we have to use code generation to access event-time, we have to refactor FlinkTypeFactory to support a complex nested rowtime. There is not much value of this feature, but introduce a lot of complexity in code base. So I think we can force watermark define on top-level columns. If user want to define on nested columns, he/she can use computed column to be a top-level column. Best, Jark On Thu, 30 Apr 2020 at 17:55, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Godfrey, > > The formatting of your example seems to be broken. > Could you send them again please? > > Regarding your points > > because watermark express can be a sub-column, just like `f1.q2` in above > example I give. > > I would put the watermark information in the row of the top-level field and > indicate to which nested field the watermark refers. > Don't we have to solve the same issue for primary keys that are defined on > a nested field? > > > A boolean flag can't represent such info. and I do know whether we will > support complex watermark expression involving multiple columns in the > future. such as: "WATERMARK FOR ts as ts + f1 + interval '1' second" > > You are right, a simple binary flag is definitely not sufficient to display > the watermark information. > I would put the expression string into the field, i.e., "ts + f1 + interval > '1' second" > > > For me the most important point of why to not show the watermark as a row > in the table is that it is not field that can be queried but meta > information on an existing field. > For the user it is important to know that a certain field has a watermark. > Otherwise, certain queries cannot be correctly specified. > Also there might be support for multiple watermarks that are defined of > different fields at some point. Would those be printed in multiple rows? > > Best, > Fabian > > > Am Do., 30. Apr. 2020 um 11:25 Uhr schrieb godfrey he <godfre...@gmail.com > >: > > > Hi Fabian, Aljoscha > > > > Thanks for the feedback. > > > > Agree with you that we can deal with primary key as you mentioned. > > now, the type column has contained the nullability attribute, e.g. BIGINT > > NOT NULL. > > (I'm also ok that we use two columns to represent type just like mysql) > > > > >Why I treat `watermark` as a special row ? > > because watermark express can be a sub-column, just like `f1.q2` in above > > example I give. > > A boolean flag can't represent such info. and I do know whether we will > > support complex > > watermark expression involving multiple columns in the future. such as: > > "WATERMARK FOR ts as ts + f1 + interval '1' second" > > > > If we do not support complex watermark expression, we can add a watermark > > column. > > > > for example: > > > > create table MyTable ( > > > > f0 BIGINT NOT NULL, > > > > f1 ROW<q1 STRING, q2 TIMESTAMP(3)>, > > > > f2 VARCHAR<256>, > > > > f3 AS f0 + 1, > > > > PRIMARY KEY (f0), > > > > UNIQUE (f3, f2), > > > > WATERMARK f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND) > > > > ) with (...) > > > > > > name > > > > type > > > > key > > > > compute column > > > > watermark > > > > f0 > > > > BIGINT NOT NULL > > > > PRI > > > > (NULL) > > > > f1 > > > > ROW<`q1` STRING, `q2` TIMESTAMP(3)> > > > > UNQ > > > > (NULL) > > > > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND) > > > > f2 > > > > VARCHAR<256> > > > > (NULL) > > > > NULL > > > > f3 > > > > BIGINT NOT NULL > > > > UNQ > > > > f0 + 1 > > > > > > or we add a column to represent nullability. > > > > name > > > > type > > > > null > > > > key > > > > compute column > > > > watermark > > > > f0 > > > > BIGINT > > > > false > > > > PRI > > > > (NULL) > > > > f1 > > > > ROW<`q1` STRING, `q2` TIMESTAMP(3)> > > > > true > > > > UNQ > > > > (NULL) > > > > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND) > > > > f2 > > > > VARCHAR<256> > > > > true > > > > (NULL) > > > > NULL > > > > f3 > > > > BIGINT > > > > false > > > > UNQ > > > > f0 + 1 > > > > > > Personally, I like the second one. (we need do some changes on > LogicalType > > to get type name without nullability) > > > > > > Best, > > Godfrey > > > > > > Aljoscha Krettek <aljos...@apache.org> 于2020年4月29日周三 下午5:47写道: > > > > > +1 I like the general idea of printing the results as a table. > > > > > > On the specifics I don't know enough but Fabians suggestions seems to > > > make sense to me. > > > > > > Aljoscha > > > > > > On 29.04.20 10:56, Fabian Hueske wrote: > > > > Hi Godfrey, > > > > > > > > Thanks for starting this discussion! > > > > > > > > In my mind, WATERMARK is a property (or constraint) of a field, just > > like > > > > PRIMARY KEY. > > > > Take this example from MySQL: > > > > > > > > mysql> CREATE TABLE people (id INT NOT NULL, name VARCHAR(128) NOT > > NULL, > > > > age INT, PRIMARY KEY (id)); > > > > Query OK, 0 rows affected (0.06 sec) > > > > > > > > mysql> describe people; > > > > +-------+--------------+------+-----+---------+-------+ > > > > | Field | Type | Null | Key | Default | Extra | > > > > +-------+--------------+------+-----+---------+-------+ > > > > | id | int | NO | PRI | NULL | | > > > > | name | varchar(128) | NO | | NULL | | > > > > | age | int | YES | | NULL | | > > > > +-------+--------------+------+-----+---------+-------+ > > > > 3 rows in set (0.01 sec) > > > > > > > > Here, PRIMARY KEY is marked in the Key column of the id field. > > > > We could do the same for watermarks by adding a Watermark column. > > > > > > > > Best, Fabian > > > > > > > > > > > > Am Mi., 29. Apr. 2020 um 10:43 Uhr schrieb godfrey he < > > > godfre...@gmail.com>: > > > > > > > >> Hi everyone, > > > >> > > > >> I would like to bring up a discussion about the result type of > > describe > > > >> statement, > > > >> which is introduced in FLIP-84[1]. > > > >> In previous version, we define the result type of `describe` > statement > > > is a > > > >> single column as following > > > >> > > > >> Statement > > > >> > > > >> Result Schema > > > >> > > > >> Result Value > > > >> > > > >> Result Kind > > > >> > > > >> Examples > > > >> > > > >> DESCRIBE xx > > > >> > > > >> field name: result > > > >> > > > >> field type: VARCHAR(n) > > > >> > > > >> (n is the max length of values) > > > >> > > > >> describe the detail of an object > > > >> > > > >> (single row) > > > >> > > > >> SUCCESS_WITH_CONTENT > > > >> > > > >> DESCRIBE table_name > > > >> > > > >> for "describe table_name", the result value is the `toString` value > of > > > >> `TableSchema`, which is an unstructured data. > > > >> It's hard to for user to use this info. > > > >> > > > >> for example: > > > >> > > > >> TableSchema schema = TableSchema.builder() > > > >> .field("f0", DataTypes.BIGINT()) > > > >> .field("f1", DataTypes.ROW( > > > >> DataTypes.FIELD("q1", DataTypes.STRING()), > > > >> DataTypes.FIELD("q2", DataTypes.TIMESTAMP(3)))) > > > >> .field("f2", DataTypes.STRING()) > > > >> .field("f3", DataTypes.BIGINT(), "f0 + 1") > > > >> .watermark("f1.q2", WATERMARK_EXPRESSION, WATERMARK_DATATYPE) > > > >> .build(); > > > >> > > > >> its `toString` value is: > > > >> root > > > >> |-- f0: BIGINT > > > >> |-- f1: ROW<`q1` STRING, `q2` TIMESTAMP(3)> > > > >> |-- f2: STRING > > > >> |-- f3: BIGINT AS f0 + 1 > > > >> |-- WATERMARK FOR f1.q2 AS now() > > > >> > > > >> For hive, MySQL, etc., the describe result is table form including > > field > > > >> names and field types. > > > >> which is more familiar with users. > > > >> TableSchema[2] has watermark expression and compute column, we > should > > > also > > > >> put them into the table: > > > >> for compute column, it's a column level, we add a new column named > > > `expr`. > > > >> for watermark expression, it's a table level, we add a special row > > > named > > > >> `WATERMARK` to represent it. > > > >> > > > >> The result will look like about above example: > > > >> > > > >> name > > > >> > > > >> type > > > >> > > > >> expr > > > >> > > > >> f0 > > > >> > > > >> BIGINT > > > >> > > > >> (NULL) > > > >> > > > >> f1 > > > >> > > > >> ROW<`q1` STRING, `q2` TIMESTAMP(3)> > > > >> > > > >> (NULL) > > > >> > > > >> f2 > > > >> > > > >> STRING > > > >> > > > >> NULL > > > >> > > > >> f3 > > > >> > > > >> BIGINT > > > >> > > > >> f0 + 1 > > > >> > > > >> WATERMARK > > > >> > > > >> (NULL) > > > >> > > > >> f1.q2 AS now() > > > >> > > > >> now there is a pr FLINK-17112 [3] to implement DESCRIBE statement. > > > >> > > > >> What do you think about this update? > > > >> Any feedback are welcome~ > > > >> > > > >> Best, > > > >> Godfrey > > > >> > > > >> [1] > > > >> > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878 > > > >> [2] > > > >> > > > >> > > > > > > https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/api/TableSchema.java > > > >> [3] https://github.com/apache/flink/pull/11892 > > > >> > > > >> > > > >> godfrey he <godfre...@gmail.com> 于2020年4月6日周一 下午10:38写道: > > > >> > > > >>> Hi Timo, > > > >>> > > > >>> Sorry for the late reply, and thanks for your correction. > > > >>> I missed DQL for job submission scenario. > > > >>> I'll fix the document right away. > > > >>> > > > >>> Best, > > > >>> Godfrey > > > >>> > > > >>> Timo Walther <twal...@apache.org> 于2020年4月3日周五 下午9:53写道: > > > >>> > > > >>>> Hi Godfrey, > > > >>>> > > > >>>> I'm sorry to jump in again but I still need to clarify some things > > > >>>> around TableResult. > > > >>>> > > > >>>> The FLIP says: > > > >>>> "For DML, this method returns TableResult until the job is > > submitted. > > > >>>> For other statements, TableResult is returned until the execution > is > > > >>>> finished." > > > >>>> > > > >>>> I thought we agreed on making every execution async? This also > means > > > >>>> returning a TableResult for DQLs even though the execution is not > > done > > > >>>> yet. People need access to the JobClient also for batch jobs in > > order > > > to > > > >>>> cancel long lasting queries. If people want to wait for the > > completion > > > >>>> they can hook into JobClient or collect(). > > > >>>> > > > >>>> Can we rephrase this part to: > > > >>>> > > > >>>> The FLIP says: > > > >>>> "For DML and DQL, this method returns TableResult once the job has > > > been > > > >>>> submitted. For DDL and DCL statements, TableResult is returned > once > > > the > > > >>>> operation has finished." > > > >>>> > > > >>>> Regards, > > > >>>> Timo > > > >>>> > > > >>>> > > > >>>> On 02.04.20 05:27, godfrey he wrote: > > > >>>>> Hi Aljoscha, Dawid, Timo, > > > >>>>> > > > >>>>> Thanks so much for the detailed explanation. > > > >>>>> Agree with you that the multiline story is not completed now, and > > we > > > >> can > > > >>>>> keep discussion. > > > >>>>> I will add current discussions and conclusions to the FLIP. > > > >>>>> > > > >>>>> Best, > > > >>>>> Godfrey > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> Timo Walther <twal...@apache.org> 于2020年4月1日周三 下午11:27写道: > > > >>>>> > > > >>>>>> Hi Godfrey, > > > >>>>>> > > > >>>>>> first of all, I agree with Dawid. The multiline story is not > > > >> completed > > > >>>>>> by this FLIP. It just verifies the big picture. > > > >>>>>> > > > >>>>>> 1. "control the execution logic through the proposed method if > > they > > > >>>> know > > > >>>>>> what the statements are" > > > >>>>>> > > > >>>>>> This is a good point that also Fabian raised in the linked > google > > > >> doc. > > > >>>> I > > > >>>>>> could also imagine to return a more complicated POJO when > calling > > > >>>>>> `executeMultiSql()`. > > > >>>>>> > > > >>>>>> The POJO would include some `getSqlProperties()` such that a > > > platform > > > >>>>>> gets insights into the query before executing. We could also > > trigger > > > >>>> the > > > >>>>>> execution more explicitly instead of hiding it behind an > iterator. > > > >>>>>> > > > >>>>>> 2. "there are some special commands introduced in SQL client" > > > >>>>>> > > > >>>>>> For platforms and SQL Client specific commands, we could offer a > > > hook > > > >>>> to > > > >>>>>> the parser or a fallback parser in case the regular table > > > environment > > > >>>>>> parser cannot deal with the statement. > > > >>>>>> > > > >>>>>> However, all of that is future work and can be discussed in a > > > >> separate > > > >>>>>> FLIP. > > > >>>>>> > > > >>>>>> 3. +1 for the `Iterator` instead of `Iterable`. > > > >>>>>> > > > >>>>>> 4. "we should convert the checked exception to unchecked > > exception" > > > >>>>>> > > > >>>>>> Yes, I meant using a runtime exception instead of a checked > > > >> exception. > > > >>>>>> There was no consensus on putting the exception into the > > > >> `TableResult`. > > > >>>>>> > > > >>>>>> Regards, > > > >>>>>> Timo > > > >>>>>> > > > >>>>>> On 01.04.20 15:35, Dawid Wysakowicz wrote: > > > >>>>>>> When considering the multi-line support I think it is helpful > to > > > >> start > > > >>>>>>> with a use case in mind. In my opinion consumers of this method > > > will > > > >>>> be: > > > >>>>>>> > > > >>>>>>> 1. sql-client > > > >>>>>>> 2. third-part sql based platforms > > > >>>>>>> > > > >>>>>>> @Godfrey As for the quit/source/... commands. I think those > > belong > > > >> to > > > >>>>>>> the responsibility of aforementioned. I think they should not > be > > > >>>>>>> understandable by the TableEnvironment. What would quit on a > > > >>>>>>> TableEnvironment do? Moreover I think such commands should be > > > >> prefixed > > > >>>>>>> appropriately. I think it's a common practice to e.g. prefix > > those > > > >>>> with > > > >>>>>>> ! or : to say they are meta commands of the tool rather than a > > > >> query. > > > >>>>>>> > > > >>>>>>> I also don't necessarily understand why platform users need to > > know > > > >>>> the > > > >>>>>>> kind of the query to use the proposed method. They should get > the > > > >> type > > > >>>>>>> from the TableResult#ResultKind. If the ResultKind is SUCCESS, > it > > > >> was > > > >>>> a > > > >>>>>>> DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If that's > not > > > >>>> enough > > > >>>>>>> we can enrich the TableResult with more explicit kind of query, > > but > > > >> so > > > >>>>>>> far I don't see such a need. > > > >>>>>>> > > > >>>>>>> @Kurt In those cases I would assume the developers want to > > present > > > >>>>>>> results of the queries anyway. Moreover I think it is safe to > > > assume > > > >>>>>>> they can adhere to such a contract that the results must be > > > >> iterated. > > > >>>>>>> > > > >>>>>>> For direct users of TableEnvironment/Table API this method does > > not > > > >>>> make > > > >>>>>>> much sense anyway, in my opinion. I think we can rather safely > > > >> assume > > > >>>> in > > > >>>>>>> this scenario they do not want to submit multiple queries at a > > > >> single > > > >>>>>> time. > > > >>>>>>> > > > >>>>>>> Best, > > > >>>>>>> > > > >>>>>>> Dawid > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> On 01/04/2020 15:07, Kurt Young wrote: > > > >>>>>>>> One comment to `executeMultilineSql`, I'm afraid sometimes > user > > > >> might > > > >>>>>>>> forget to > > > >>>>>>>> iterate the returned iterators, e.g. user submits a bunch of > > DDLs > > > >> and > > > >>>>>>>> expect the > > > >>>>>>>> framework will execute them one by one. But it didn't. > > > >>>>>>>> > > > >>>>>>>> Best, > > > >>>>>>>> Kurt > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> On Wed, Apr 1, 2020 at 5:10 PM Aljoscha Krettek< > > > >> aljos...@apache.org> > > > >>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>>> Agreed to what Dawid and Timo said. > > > >>>>>>>>> > > > >>>>>>>>> To answer your question about multi line SQL: no, we don't > > think > > > >> we > > > >>>>>> need > > > >>>>>>>>> this in Flink 1.11, we only wanted to make sure that the > > > >> interfaces > > > >>>>>> that > > > >>>>>>>>> we now put in place will potentially allow this in the > future. > > > >>>>>>>>> > > > >>>>>>>>> Best, > > > >>>>>>>>> Aljoscha > > > >>>>>>>>> > > > >>>>>>>>> On 01.04.20 09:31, godfrey he wrote: > > > >>>>>>>>>> Hi, Timo & Dawid, > > > >>>>>>>>>> > > > >>>>>>>>>> Thanks so much for the effort of `multiline statements > > > >> supporting`, > > > >>>>>>>>>> I have a few questions about this method: > > > >>>>>>>>>> > > > >>>>>>>>>> 1. users can well control the execution logic through the > > > >> proposed > > > >>>>>> method > > > >>>>>>>>>> if they know what the statements are (a statement is a > > > DDL, a > > > >>>> DML > > > >>>>>> or > > > >>>>>>>>>> others). > > > >>>>>>>>>> but if a statement is from a file, that means users do not > > know > > > >>>> what > > > >>>>>> the > > > >>>>>>>>>> statements are, > > > >>>>>>>>>> the execution behavior is unclear. > > > >>>>>>>>>> As a platform user, I think this method is hard to use, > unless > > > >> the > > > >>>>>>>>> platform > > > >>>>>>>>>> defines > > > >>>>>>>>>> a set of rule about the statements order, such as: no select > > in > > > >> the > > > >>>>>>>>> middle, > > > >>>>>>>>>> dml must be at tail of sql file (which may be the most case > in > > > >>>> product > > > >>>>>>>>>> env). > > > >>>>>>>>>> Otherwise the platform must parse the sql first, then know > > what > > > >> the > > > >>>>>>>>>> statements are. > > > >>>>>>>>>> If do like that, the platform can handle all cases through > > > >>>>>> `executeSql` > > > >>>>>>>>> and > > > >>>>>>>>>> `StatementSet`. > > > >>>>>>>>>> > > > >>>>>>>>>> 2. SQL client can't also use `executeMultilineSql` to > supports > > > >>>>>> multiline > > > >>>>>>>>>> statements, > > > >>>>>>>>>> because there are some special commands introduced in > SQL > > > >>>> client, > > > >>>>>>>>>> such as `quit`, `source`, `load jar` (not exist now, but > maybe > > > we > > > >>>> need > > > >>>>>>>>> this > > > >>>>>>>>>> command > > > >>>>>>>>>> to support dynamic table source and udf). > > > >>>>>>>>>> Does TableEnvironment also supports those commands? > > > >>>>>>>>>> > > > >>>>>>>>>> 3. btw, we must have this feature in release-1.11? I find > > there > > > >> are > > > >>>>>> few > > > >>>>>>>>>> user cases > > > >>>>>>>>>> in the feedback document which behavior is unclear now. > > > >>>>>>>>>> > > > >>>>>>>>>> regarding to "change the return value from `Iterable<Row` to > > > >>>>>>>>>> `Iterator<Row`", > > > >>>>>>>>>> I couldn't agree more with this change. Just as Dawid > > mentioned > > > >>>>>>>>>> "The contract of the Iterable#iterator is that it returns a > > new > > > >>>>>> iterator > > > >>>>>>>>>> each time, > > > >>>>>>>>>> which effectively means we can iterate the results > > multiple > > > >>>>>> times.", > > > >>>>>>>>>> we does not provide iterate the results multiple times. > > > >>>>>>>>>> If we want do that, the client must buffer all results. but > > it's > > > >>>>>>>>> impossible > > > >>>>>>>>>> for streaming job. > > > >>>>>>>>>> > > > >>>>>>>>>> Best, > > > >>>>>>>>>> Godfrey > > > >>>>>>>>>> > > > >>>>>>>>>> Dawid Wysakowicz<dwysakow...@apache.org> 于2020年4月1日周三 > > > 上午3:14写道: > > > >>>>>>>>>> > > > >>>>>>>>>>> Thank you Timo for the great summary! It covers (almost) > all > > > the > > > >>>>>> topics. > > > >>>>>>>>>>> Even though in the end we are not suggesting much changes > to > > > the > > > >>>>>> current > > > >>>>>>>>>>> state of FLIP I think it is important to lay out all > possible > > > >> use > > > >>>>>> cases > > > >>>>>>>>>>> so that we do not change the execution model every release. > > > >>>>>>>>>>> > > > >>>>>>>>>>> There is one additional thing we discussed. Could we change > > the > > > >>>>>> result > > > >>>>>>>>>>> type of TableResult#collect to Iterator<Row>? Even though > > those > > > >>>>>>>>>>> interfaces do not differ much. I think Iterator better > > > describes > > > >>>> that > > > >>>>>>>>>>> the results might not be materialized on the client side, > but > > > >> can > > > >>>> be > > > >>>>>>>>>>> retrieved on a per record basis. The contract of the > > > >>>>>> Iterable#iterator > > > >>>>>>>>>>> is that it returns a new iterator each time, which > > effectively > > > >>>> means > > > >>>>>> we > > > >>>>>>>>>>> can iterate the results multiple times. Iterating the > results > > > is > > > >>>> not > > > >>>>>>>>>>> possible when we don't retrieve all the results from the > > > cluster > > > >>>> at > > > >>>>>>>>> once. > > > >>>>>>>>>>> I think we should also use Iterator for > > > >>>>>>>>>>> TableEnvironment#executeMultilineSql(String statements): > > > >>>>>>>>>>> Iterator<TableResult>. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Best, > > > >>>>>>>>>>> > > > >>>>>>>>>>> Dawid > > > >>>>>>>>>>> > > > >>>>>>>>>>> On 31/03/2020 19:27, Timo Walther wrote: > > > >>>>>>>>>>>> Hi Godfrey, > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Aljoscha, Dawid, Klou, and I had another discussion around > > > >>>> FLIP-84. > > > >>>>>> In > > > >>>>>>>>>>>> particular, we discussed how the current status of the > FLIP > > > and > > > >>>> the > > > >>>>>>>>>>>> future requirements around multiline statements, > async/sync, > > > >>>>>> collect() > > > >>>>>>>>>>>> fit together. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> We also updated the FLIP-84 Feedback Summary document [1] > > with > > > >>>> some > > > >>>>>>>>>>>> use cases. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> We believe that we found a good solution that also fits to > > > what > > > >>>> is > > > >>>>>> in > > > >>>>>>>>>>>> the current FLIP. So no bigger changes necessary, which is > > > >> great! > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Our findings were: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> 1. Async vs sync submission of Flink jobs: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Having a blocking `execute()` in DataStream API was > rather a > > > >>>>>> mistake. > > > >>>>>>>>>>>> Instead all submissions should be async because this > allows > > > >>>>>> supporting > > > >>>>>>>>>>>> both modes if necessary. Thus, submitting all queries > async > > > >>>> sounds > > > >>>>>>>>>>>> good to us. If users want to run a job sync, they can use > > the > > > >>>>>>>>>>>> JobClient and wait for completion (or collect() in case of > > > >> batch > > > >>>>>> jobs). > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> 2. Multi-statement execution: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> For the multi-statement execution, we don't see a > > > >> contradication > > > >>>>>> with > > > >>>>>>>>>>>> the async execution behavior. We imagine a method like: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> TableEnvironment#executeMultilineSql(String statements): > > > >>>>>>>>>>>> Iterable<TableResult> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Where the `Iterator#next()` method would trigger the next > > > >>>> statement > > > >>>>>>>>>>>> submission. This allows a caller to decide synchronously > > when > > > >> to > > > >>>>>>>>>>>> submit statements async to the cluster. Thus, a service > such > > > as > > > >>>> the > > > >>>>>>>>>>>> SQL Client can handle the result of each statement > > > individually > > > >>>> and > > > >>>>>>>>>>>> process statement by statement sequentially. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> 3. The role of TableResult and result retrieval in general > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> `TableResult` is similar to `JobClient`. Instead of > > returning > > > a > > > >>>>>>>>>>>> `CompletableFuture` of something, it is a concrete util > > class > > > >>>> where > > > >>>>>>>>>>>> some methods have the behavior of completable future (e.g. > > > >>>>>> collect(), > > > >>>>>>>>>>>> print()) and some are already completed (getTableSchema(), > > > >>>>>>>>>>>> getResultKind()). > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> `StatementSet#execute()` returns a single `TableResult` > > > because > > > >>>> the > > > >>>>>>>>>>>> order is undefined in a set and all statements have the > same > > > >>>> schema. > > > >>>>>>>>>>>> Its `collect()` will return a row for each executed > `INSERT > > > >>>> INTO` in > > > >>>>>>>>>>>> the order of statement definition. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> For simple `SELECT * FROM ...`, the query execution might > > > block > > > >>>>>> until > > > >>>>>>>>>>>> `collect()` is called to pull buffered rows from the job > > (from > > > >>>>>>>>>>>> socket/REST API what ever we will use in the future). We > can > > > >> say > > > >>>>>> that > > > >>>>>>>>>>>> a statement finished successfully, when the > > > >>>>>> `collect#Iterator#hasNext` > > > >>>>>>>>>>>> has returned false. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> I hope this summarizes our discussion > @Dawid/Aljoscha/Klou? > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> It would be great if we can add these findings to the FLIP > > > >>>> before we > > > >>>>>>>>>>>> start voting. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> One minor thing: some `execute()` methods still throw a > > > checked > > > >>>>>>>>>>>> exception; can we remove that from the FLIP? Also the > above > > > >>>>>> mentioned > > > >>>>>>>>>>>> `Iterator#next()` would trigger an execution without > > throwing > > > a > > > >>>>>>>>>>>> checked exception. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Thanks, > > > >>>>>>>>>>>> Timo > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> [1] > > > >>>>>>>>>>>> > > > >>>>>>>>> > > > >>>>>> > > > >>>> > > > >> > > > > > > https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit# > > > >>>>>>>>>>>> On 31.03.20 06:28, godfrey he wrote: > > > >>>>>>>>>>>>> Hi, Timo & Jark > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Thanks for your explanation. > > > >>>>>>>>>>>>> Agree with you that async execution should always be > async, > > > >>>>>>>>>>>>> and sync execution scenario can be covered by async > > > >> execution. > > > >>>>>>>>>>>>> It helps provide an unified entry point for batch and > > > >> streaming. > > > >>>>>>>>>>>>> I think we can also use sync execution for some testing. > > > >>>>>>>>>>>>> So, I agree with you that we provide `executeSql` method > > and > > > >>>> it's > > > >>>>>>>>> async > > > >>>>>>>>>>>>> method. > > > >>>>>>>>>>>>> If we want sync method in the future, we can add method > > named > > > >>>>>>>>>>>>> `executeSqlSync`. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> I think we've reached an agreement. I will update the > > > >> document, > > > >>>> and > > > >>>>>>>>>>>>> start > > > >>>>>>>>>>>>> voting process. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>> Godfrey > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Jark Wu<imj...@gmail.com> 于2020年3月31日周二 上午12:46写道: > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Hi, > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> I didn't follow the full discussion. > > > >>>>>>>>>>>>>> But I share the same concern with Timo that streaming > > > queries > > > >>>>>> should > > > >>>>>>>>>>>>>> always > > > >>>>>>>>>>>>>> be async. > > > >>>>>>>>>>>>>> Otherwise, I can image it will cause a lot of confusion > > and > > > >>>>>> problems > > > >>>>>>>>> if > > > >>>>>>>>>>>>>> users don't deeply keep the "sync" in mind (e.g. client > > > >> hangs). > > > >>>>>>>>>>>>>> Besides, the streaming mode is still the majority use > > cases > > > >> of > > > >>>>>> Flink > > > >>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>> Flink SQL. We should put the usability at a high > priority. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>>> Jark > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> On Mon, 30 Mar 2020 at 23:27, Timo Walther< > > > >> twal...@apache.org> > > > >>>>>>>>> wrote: > > > >>>>>>>>>>>>>>> Hi Godfrey, > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> maybe I wasn't expressing my biggest concern enough in > my > > > >> last > > > >>>>>> mail. > > > >>>>>>>>>>>>>>> Even in a singleline and sync execution, I think that > > > >>>> streaming > > > >>>>>>>>>>>>>>> queries > > > >>>>>>>>>>>>>>> should not block the execution. Otherwise it is not > > > possible > > > >>>> to > > > >>>>>> call > > > >>>>>>>>>>>>>>> collect() or print() on them afterwards. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> "there are too many things need to discuss for > > multiline": > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> True, I don't want to solve all of them right now. But > > what > > > >> I > > > >>>>>> know > > > >>>>>>>>> is > > > >>>>>>>>>>>>>>> that our newly introduced methods should fit into a > > > >> multiline > > > >>>>>>>>>>>>>>> execution. > > > >>>>>>>>>>>>>>> There is no big difference of calling `executeSql(A), > > > >>>>>>>>>>>>>>> executeSql(B)` and > > > >>>>>>>>>>>>>>> processing a multiline file `A;\nB;`. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> I think the example that you mentioned can simply be > > > >> undefined > > > >>>>>> for > > > >>>>>>>>>>>>>>> now. > > > >>>>>>>>>>>>>>> Currently, no catalog is modifying data but just > > metadata. > > > >>>> This > > > >>>>>> is a > > > >>>>>>>>>>>>>>> separate discussion. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> "result of the second statement is indeterministic": > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Sure this is indeterministic. But this is the > > implementers > > > >>>> fault > > > >>>>>>>>>>>>>>> and we > > > >>>>>>>>>>>>>>> cannot forbid such pipelines. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> How about we always execute streaming queries async? It > > > >> would > > > >>>>>>>>> unblock > > > >>>>>>>>>>>>>>> executeSql() and multiline statements. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Having a `executeSqlAsync()` is useful for batch. > > However, > > > I > > > >>>>>> don't > > > >>>>>>>>>>>>>>> want > > > >>>>>>>>>>>>>>> `sync/async` be the new batch/stream flag. The > execution > > > >>>> behavior > > > >>>>>>>>>>>>>>> should > > > >>>>>>>>>>>>>>> come from the query itself. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Regards, > > > >>>>>>>>>>>>>>> Timo > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> On 30.03.20 11:12, godfrey he wrote: > > > >>>>>>>>>>>>>>>> Hi Timo, > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Agree with you that streaming queries is our top > > priority, > > > >>>>>>>>>>>>>>>> but I think there are too many things need to discuss > > for > > > >>>>>> multiline > > > >>>>>>>>>>>>>>>> statements: > > > >>>>>>>>>>>>>>>> e.g. > > > >>>>>>>>>>>>>>>> 1. what's the behaivor of DDL and DML mixing for async > > > >>>>>> execution: > > > >>>>>>>>>>>>>>>> create table t1 xxx; > > > >>>>>>>>>>>>>>>> create table t2 xxx; > > > >>>>>>>>>>>>>>>> insert into t2 select * from t1 where xxx; > > > >>>>>>>>>>>>>>>> drop table t1; // t1 may be a MySQL table, the data > will > > > >>>> also be > > > >>>>>>>>>>>>>> deleted. > > > >>>>>>>>>>>>>>>> t1 is dropped when "insert" job is running. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> 2. what's the behaivor of unified scenario for async > > > >>>> execution: > > > >>>>>>>>>>>>>>>> (as you > > > >>>>>>>>>>>>>>>> mentioned) > > > >>>>>>>>>>>>>>>> INSERT INTO t1 SELECT * FROM s; > > > >>>>>>>>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM; > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> The result of the second statement is indeterministic, > > > >>>> because > > > >>>>>> the > > > >>>>>>>>>>>>>> first > > > >>>>>>>>>>>>>>>> statement maybe is running. > > > >>>>>>>>>>>>>>>> I think we need to put a lot of effort to define the > > > >>>> behavior of > > > >>>>>>>>>>>>>>> logically > > > >>>>>>>>>>>>>>>> related queries. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> In this FLIP, I suggest we only handle single > statement, > > > >> and > > > >>>> we > > > >>>>>>>>> also > > > >>>>>>>>>>>>>>>> introduce an async execute method > > > >>>>>>>>>>>>>>>> which is more important and more often used for users. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Dor the sync methods (like > `TableEnvironment.executeSql` > > > >> and > > > >>>>>>>>>>>>>>>> `StatementSet.execute`), > > > >>>>>>>>>>>>>>>> the result will be returned until the job is finished. > > The > > > >>>>>>>>> following > > > >>>>>>>>>>>>>>>> methods will be introduced in this FLIP: > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> /** > > > >>>>>>>>>>>>>>>> * Asynchronously execute the given single > > > statement > > > >>>>>>>>>>>>>>>> */ > > > >>>>>>>>>>>>>>>> TableEnvironment.executeSqlAsync(String statement): > > > >>>> TableResult > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> /** > > > >>>>>>>>>>>>>>>> * Asynchronously execute the dml statements as > a > > > >> batch > > > >>>>>>>>>>>>>>>> */ > > > >>>>>>>>>>>>>>>> StatementSet.executeAsync(): TableResult > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> public interface TableResult { > > > >>>>>>>>>>>>>>>> /** > > > >>>>>>>>>>>>>>>> * return JobClient for DQL and DML in async > > > mode, > > > >>>> else > > > >>>>>>>>> return > > > >>>>>>>>>>>>>>>> Optional.empty > > > >>>>>>>>>>>>>>>> */ > > > >>>>>>>>>>>>>>>> Optional<JobClient> getJobClient(); > > > >>>>>>>>>>>>>>>> } > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> what do you think? > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>>>>> Godfrey > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Timo Walther<twal...@apache.org> 于2020年3月26日周四 > > 下午9:15写道: > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Hi Godfrey, > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> executing streaming queries must be our top priority > > > >> because > > > >>>>>> this > > > >>>>>>>>> is > > > >>>>>>>>>>>>>>>>> what distinguishes Flink from competitors. If we > change > > > >> the > > > >>>>>>>>>>>>>>>>> execution > > > >>>>>>>>>>>>>>>>> behavior, we should think about the other cases as > well > > > to > > > >>>> not > > > >>>>>>>>> break > > > >>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>> API a third time. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> I fear that just having an async execute method will > > not > > > >> be > > > >>>>>> enough > > > >>>>>>>>>>>>>>>>> because users should be able to mix streaming and > batch > > > >>>> queries > > > >>>>>>>>> in a > > > >>>>>>>>>>>>>>>>> unified scenario. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> If I remember it correctly, we had some discussions > in > > > the > > > >>>> past > > > >>>>>>>>>>>>>>>>> about > > > >>>>>>>>>>>>>>>>> what decides about the execution mode of a query. > > > >>>> Currently, we > > > >>>>>>>>>>>>>>>>> would > > > >>>>>>>>>>>>>>>>> like to let the query decide, not derive it from the > > > >>>> sources. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> So I could image a multiline pipeline as: > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> USE CATALOG 'mycat'; > > > >>>>>>>>>>>>>>>>> INSERT INTO t1 SELECT * FROM s; > > > >>>>>>>>>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM; > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> For executeMultilineSql(): > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> sync because regular SQL > > > >>>>>>>>>>>>>>>>> sync because regular Batch SQL > > > >>>>>>>>>>>>>>>>> async because Streaming SQL > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> For executeAsyncMultilineSql(): > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> async because everything should be async > > > >>>>>>>>>>>>>>>>> async because everything should be async > > > >>>>>>>>>>>>>>>>> async because everything should be async > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> What we should not start for > > executeAsyncMultilineSql(): > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> sync because DDL > > > >>>>>>>>>>>>>>>>> async because everything should be async > > > >>>>>>>>>>>>>>>>> async because everything should be async > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> What are you thoughts here? > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Regards, > > > >>>>>>>>>>>>>>>>> Timo > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> On 26.03.20 12:50, godfrey he wrote: > > > >>>>>>>>>>>>>>>>>> Hi Timo, > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> I agree with you that streaming queries mostly need > > > async > > > >>>>>>>>>>>>>>>>>> execution. > > > >>>>>>>>>>>>>>>>>> In fact, our original plan is only introducing sync > > > >>>> methods in > > > >>>>>>>>> this > > > >>>>>>>>>>>>>>> FLIP, > > > >>>>>>>>>>>>>>>>>> and async methods (like "executeSqlAsync") will be > > > >>>> introduced > > > >>>>>> in > > > >>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>> future > > > >>>>>>>>>>>>>>>>>> which is mentioned in the appendix. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Maybe the async methods also need to be considered > in > > > >> this > > > >>>>>> FLIP. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> I think sync methods is also useful for streaming > > which > > > >>>> can be > > > >>>>>>>>> used > > > >>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>> run > > > >>>>>>>>>>>>>>>>>> bounded source. > > > >>>>>>>>>>>>>>>>>> Maybe we should check whether all sources are > bounded > > in > > > >>>> sync > > > >>>>>>>>>>>>>> execution > > > >>>>>>>>>>>>>>>>>> mode. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> Also, if we block for streaming queries, we could > > never > > > >>>>>> support > > > >>>>>>>>>>>>>>>>>>> multiline files. Because the first INSERT INTO > would > > > >> block > > > >>>>>> the > > > >>>>>>>>>>>>>> further > > > >>>>>>>>>>>>>>>>>>> execution. > > > >>>>>>>>>>>>>>>>>> agree with you, we need async method to submit > > multiline > > > >>>>>> files, > > > >>>>>>>>>>>>>>>>>> and files should be limited that the DQL and DML > > should > > > >> be > > > >>>>>>>>>>>>>>>>>> always in > > > >>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>> end for streaming. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>>>>>>> Godfrey > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Timo Walther<twal...@apache.org> 于2020年3月26日周四 > > > >> 下午4:29写道: > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> Hi Godfrey, > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> having control over the job after submission is a > > > >>>> requirement > > > >>>>>>>>> that > > > >>>>>>>>>>>>>> was > > > >>>>>>>>>>>>>>>>>>> requested frequently (some examples are [1], [2]). > > > Users > > > >>>>>> would > > > >>>>>>>>>>>>>>>>>>> like > > > >>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>> get insights about the running or completed job. > > > >> Including > > > >>>>>> the > > > >>>>>>>>>>>>>> jobId, > > > >>>>>>>>>>>>>>>>>>> jobGraph etc., the JobClient summarizes these > > > >> properties. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> It is good to have a discussion about > > > >>>>>> synchronous/asynchronous > > > >>>>>>>>>>>>>>>>>>> submission now to have a complete execution > picture. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> I thought we submit streaming queries mostly async > > and > > > >>>> just > > > >>>>>>>>>>>>>>>>>>> wait for > > > >>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>> successful submission. If we block for streaming > > > >> queries, > > > >>>> how > > > >>>>>>>>>>>>>>>>>>> can we > > > >>>>>>>>>>>>>>>>>>> collect() or print() results? > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> Also, if we block for streaming queries, we could > > never > > > >>>>>> support > > > >>>>>>>>>>>>>>>>>>> multiline files. Because the first INSERT INTO > would > > > >> block > > > >>>>>> the > > > >>>>>>>>>>>>>> further > > > >>>>>>>>>>>>>>>>>>> execution. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> If we decide to block entirely on streaming > queries, > > we > > > >>>> need > > > >>>>>> the > > > >>>>>>>>>>>>>> async > > > >>>>>>>>>>>>>>>>>>> execution methods in the design already. However, I > > > >> would > > > >>>>>>>>>>>>>>>>>>> rather go > > > >>>>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>>>> non-blocking streaming queries. Also with the `EMIT > > > >>>> STREAM` > > > >>>>>> key > > > >>>>>>>>>>>>>>>>>>> word > > > >>>>>>>>>>>>>>> in > > > >>>>>>>>>>>>>>>>>>> mind that we might add to SQL statements soon. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> Regards, > > > >>>>>>>>>>>>>>>>>>> Timo > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> [1] > https://issues.apache.org/jira/browse/FLINK-16761 > > > >>>>>>>>>>>>>>>>>>> [2] > https://issues.apache.org/jira/browse/FLINK-12214 > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> On 25.03.20 16:30, godfrey he wrote: > > > >>>>>>>>>>>>>>>>>>>> Hi Timo, > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Thanks for the updating. > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Regarding to "multiline statement support", I'm > also > > > >> fine > > > >>>>>> that > > > >>>>>>>>>>>>>>>>>>>> `TableEnvironment.executeSql()` only supports > single > > > >> line > > > >>>>>>>>>>>>>> statement, > > > >>>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>>>>>> can support multiline statement later (needs more > > > >>>> discussion > > > >>>>>>>>>>>>>>>>>>>> about > > > >>>>>>>>>>>>>>>>> this). > > > >>>>>>>>>>>>>>>>>>>> Regarding to "StatementSet.explian()", I don't > have > > > >>>> strong > > > >>>>>>>>>>>>>>>>>>>> opinions > > > >>>>>>>>>>>>>>>>> about > > > >>>>>>>>>>>>>>>>>>>> that. > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Regarding to "TableResult.getJobClient()", I think > > > it's > > > >>>>>>>>>>>>>> unnecessary. > > > >>>>>>>>>>>>>>>>> The > > > >>>>>>>>>>>>>>>>>>>> reason is: first, many statements (e.g. DDL, show > > xx, > > > >> use > > > >>>>>> xx) > > > >>>>>>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>> not > > > >>>>>>>>>>>>>>>>>>>> submit a Flink job. second, > > > >>>> `TableEnvironment.executeSql()` > > > >>>>>> and > > > >>>>>>>>>>>>>>>>>>>> `StatementSet.execute()` are synchronous method, > > > >>>>>> `TableResult` > > > >>>>>>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>> be > > > >>>>>>>>>>>>>>>>>>>> returned only after the job is finished or failed. > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Regarding to "whether StatementSet.execute() needs > > to > > > >>>> throw > > > >>>>>>>>>>>>>>>>> exception", I > > > >>>>>>>>>>>>>>>>>>>> think we should choose a unified way to tell > whether > > > >> the > > > >>>>>>>>>>>>>>>>>>>> execution > > > >>>>>>>>>>>>>> is > > > >>>>>>>>>>>>>>>>>>>> successful. If `TableResult` contains ERROR kind > > > >>>>>> (non-runtime > > > >>>>>>>>>>>>>>>>> exception), > > > >>>>>>>>>>>>>>>>>>>> users need to not only check the result but also > > catch > > > >>>> the > > > >>>>>>>>>>>>>>>>>>>> runtime > > > >>>>>>>>>>>>>>>>>>>> exception in their code. or > `StatementSet.execute()` > > > >> does > > > >>>>>> not > > > >>>>>>>>>>>>>>>>>>>> throw > > > >>>>>>>>>>>>>>> any > > > >>>>>>>>>>>>>>>>>>>> exception (including runtime exception), all > > exception > > > >>>>>>>>>>>>>>>>>>>> messages are > > > >>>>>>>>>>>>>>> in > > > >>>>>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>> result. I prefer "StatementSet.execute() needs to > > > >> throw > > > >>>>>>>>>>>>>> exception". > > > >>>>>>>>>>>>>>> cc > > > >>>>>>>>>>>>>>>>>>> @Jark > > > >>>>>>>>>>>>>>>>>>>> Wu<imj...@gmail.com> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> I will update the agreed parts to the document > > first. > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>>>>>>>>> Godfrey > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Timo Walther<twal...@apache.org> 于2020年3月25日周三 > > > >>>>>>>>>>>>>>>>>>>> 下午6:51写道: > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> Hi Godfrey, > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> thanks for starting the discussion on the mailing > > > >> list. > > > >>>> And > > > >>>>>>>>>>>>>>>>>>>>> sorry > > > >>>>>>>>>>>>>>>>> again > > > >>>>>>>>>>>>>>>>>>>>> for the late reply to FLIP-84. I have updated the > > > >> Google > > > >>>>>> doc > > > >>>>>>>>> one > > > >>>>>>>>>>>>>>> more > > > >>>>>>>>>>>>>>>>>>>>> time to incorporate the offline discussions. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> From Dawid's and my view, it is fine to > > > >>>> postpone the > > > >>>>>>>>>>>>>>>>>>>>> multiline > > > >>>>>>>>>>>>>>>>> support > > > >>>>>>>>>>>>>>>>>>>>> to a separate method. This can be future work > even > > > >>>> though > > > >>>>>> we > > > >>>>>>>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>> need > > > >>>>>>>>>>>>>>>>>>>>> it rather soon. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> If there are no objections, I suggest to update > the > > > >>>> FLIP-84 > > > >>>>>>>>>>>>>>>>>>>>> again > > > >>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>>> have another voting process. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> Thanks, > > > >>>>>>>>>>>>>>>>>>>>> Timo > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> On 25.03.20 11:17, godfrey he wrote: > > > >>>>>>>>>>>>>>>>>>>>>> Hi community, > > > >>>>>>>>>>>>>>>>>>>>>> Timo, Fabian and Dawid have some feedbacks about > > > >>>>>> FLIP-84[1]. > > > >>>>>>>>>>>>>>>>>>>>>> The > > > >>>>>>>>>>>>>>>>>>>>> feedbacks > > > >>>>>>>>>>>>>>>>>>>>>> are all about new introduced methods. We had a > > > >>>> discussion > > > >>>>>>>>>>>>>>> yesterday, > > > >>>>>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>>>>> most of feedbacks have been agreed upon. Here is > > the > > > >>>>>>>>>>>>>>>>>>>>>> conclusions: > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> *1. about proposed methods in > `TableEnvironment`:* > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> the original proposed methods: > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> TableEnvironment.createDmlBatch(): DmlBatch > > > >>>>>>>>>>>>>>>>>>>>>> TableEnvironment.executeStatement(String > > statement): > > > >>>>>>>>>>>>>>>>>>>>>> ResultTable > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> the new proposed methods: > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> // we should not use abbreviations in the API, > and > > > >> the > > > >>>>>> term > > > >>>>>>>>>>>>>> "Batch" > > > >>>>>>>>>>>>>>>>> is > > > >>>>>>>>>>>>>>>>>>>>>> easily confused with batch/streaming processing > > > >>>>>>>>>>>>>>>>>>>>>> TableEnvironment.createStatementSet(): > > StatementSet > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> // every method that takes SQL should have `Sql` > > in > > > >> its > > > >>>>>> name > > > >>>>>>>>>>>>>>>>>>>>>> // supports multiline statement ??? > > > >>>>>>>>>>>>>>>>>>>>>> TableEnvironment.executeSql(String statement): > > > >>>> TableResult > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> // new methods. supports explaining DQL and DML > > > >>>>>>>>>>>>>>>>>>>>>> TableEnvironment.explainSql(String statement, > > > >>>>>>>>> ExplainDetail... > > > >>>>>>>>>>>>>>>>>>> details): > > > >>>>>>>>>>>>>>>>>>>>>> String > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> *2. about proposed related classes:* > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> the original proposed classes: > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> interface DmlBatch { > > > >>>>>>>>>>>>>>>>>>>>>> void addInsert(String insert); > > > >>>>>>>>>>>>>>>>>>>>>> void addInsert(String targetPath, > > Table > > > >>>> table); > > > >>>>>>>>>>>>>>>>>>>>>> ResultTable execute() throws > > Exception > > > ; > > > >>>>>>>>>>>>>>>>>>>>>> String explain(boolean extended); > > > >>>>>>>>>>>>>>>>>>>>>> } > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> public interface ResultTable { > > > >>>>>>>>>>>>>>>>>>>>>> TableSchema getResultSchema(); > > > >>>>>>>>>>>>>>>>>>>>>> Iterable<Row> getResultRows(); > > > >>>>>>>>>>>>>>>>>>>>>> } > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> the new proposed classes: > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> interface StatementSet { > > > >>>>>>>>>>>>>>>>>>>>>> // every method that takes SQL > should > > > >> have > > > >>>>>> `Sql` in > > > >>>>>>>>>>>>>>>>>>>>>> its > > > >>>>>>>>>>>>>>> name > > > >>>>>>>>>>>>>>>>>>>>>> // return StatementSet instance for > > > >> fluent > > > >>>>>>>>> programming > > > >>>>>>>>>>>>>>>>>>>>>> addInsertSql(String statement): > > > >>>> StatementSet > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> // return StatementSet instance for > > > >> fluent > > > >>>>>>>>> programming > > > >>>>>>>>>>>>>>>>>>>>>> addInsert(String tablePath, Table > > > table): > > > >>>>>>>>> StatementSet > > > >>>>>>>>>>>>>>>>>>>>>> // new method. support overwrite > mode > > > >>>>>>>>>>>>>>>>>>>>>> addInsert(String tablePath, Table > > > table, > > > >>>>>> boolean > > > >>>>>>>>>>>>>>> overwrite): > > > >>>>>>>>>>>>>>>>>>>>>> StatementSet > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> explain(): String > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> // new method. supports adding more > > > >> details > > > >>>>>> for the > > > >>>>>>>>>>>>>> result > > > >>>>>>>>>>>>>>>>>>>>>> explain(ExplainDetail... > > extraDetails): > > > >>>> String > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> // throw exception ??? > > > >>>>>>>>>>>>>>>>>>>>>> execute(): TableResult > > > >>>>>>>>>>>>>>>>>>>>>> } > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> interface TableResult { > > > >>>>>>>>>>>>>>>>>>>>>> getTableSchema(): TableSchema > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> // avoid custom parsing of an "OK" > > row > > > in > > > >>>>>>>>> programming > > > >>>>>>>>>>>>>>>>>>>>>> getResultKind(): ResultKind > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> // instead of `get` make it > explicit > > > that > > > >>>> this > > > >>>>>> is > > > >>>>>>>>>>>>>>>>>>>>>> might > > > >>>>>>>>>>>>>> be > > > >>>>>>>>>>>>>>>>>>>>> triggering > > > >>>>>>>>>>>>>>>>>>>>>> an expensive operation > > > >>>>>>>>>>>>>>>>>>>>>> collect(): Iterable<Row> > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> // for fluent programming > > > >>>>>>>>>>>>>>>>>>>>>> print(): Unit > > > >>>>>>>>>>>>>>>>>>>>>> } > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> enum ResultKind { > > > >>>>>>>>>>>>>>>>>>>>>> SUCCESS, // for DDL, DCL and > > statements > > > >>>> with a > > > >>>>>>>>> simple > > > >>>>>>>>>>>>>> "OK" > > > >>>>>>>>>>>>>>>>>>>>>> SUCCESS_WITH_CONTENT, // rows with > > > >>>> important > > > >>>>>>>>>>>>>>>>>>>>>> content are > > > >>>>>>>>>>>>>>>>>>> available > > > >>>>>>>>>>>>>>>>>>>>>> (DML, DQL) > > > >>>>>>>>>>>>>>>>>>>>>> } > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> *3. new proposed methods in `Table`* > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> `Table.insertInto()` will be deprecated, and the > > > >>>> following > > > >>>>>>>>>>>>>> methods > > > >>>>>>>>>>>>>>>>> are > > > >>>>>>>>>>>>>>>>>>>>>> introduced: > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> Table.executeInsert(String tablePath): > TableResult > > > >>>>>>>>>>>>>>>>>>>>>> Table.executeInsert(String tablePath, boolean > > > >>>> overwrite): > > > >>>>>>>>>>>>>>> TableResult > > > >>>>>>>>>>>>>>>>>>>>>> Table.explain(ExplainDetail... details): String > > > >>>>>>>>>>>>>>>>>>>>>> Table.execute(): TableResult > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> There are two issues need further discussion, > one > > is > > > >>>>>> whether > > > >>>>>>>>>>>>>>>>>>>>>> `TableEnvironment.executeSql(String statement): > > > >>>>>> TableResult` > > > >>>>>>>>>>>>>> needs > > > >>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>>> support multiline statement (or whether > > > >>>> `TableEnvironment` > > > >>>>>>>>>>>>>>>>>>>>>> needs > > > >>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>> support > > > >>>>>>>>>>>>>>>>>>>>>> multiline statement), and another one is whether > > > >>>>>>>>>>>>>>>>>>> `StatementSet.execute()` > > > >>>>>>>>>>>>>>>>>>>>>> needs to throw exception. > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> please refer to the feedback document [2] for > the > > > >>>> details. > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> Any suggestions are warmly welcomed! > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>>> [1] > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > > > >>>>>> > > > >>>> > > > >> > > > > > > https://wiki.apache.org/confluence/pages/viewpage.action?pageId=134745878 > > > >>>>>>>>>>>>>>>>>>>>>> [2] > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>> > > > >>>>>> > > > >>>> > > > >> > > > > > > https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit > > > >>>>>>>>>>>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>>>>>>>>>>> Godfrey > > > >>>>>>>>>>>>>>>>>>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>>> > > > >> > > > > > > > > > > > > >