Re: [DISCUSS] FLIP-84 Feedback Summary

godfrey he Wed, 06 May 2020 08:51:44 -0700

Hi @[email protected] @Timo Walther <[email protected]>  @Dawid Wysakowicz
<[email protected]>
What do you think we limit watermark must be defined on top-level column ?


if we do that, we can add an expression column to represent watermark like
compute column,
An example of all cases:
create table MyTable (
  f0 BIGINT NOT NULL,
  f1 ROW<q1 STRING, q2 TIMESTAMP(3)>,
  f2 VARCHAR<256>,
  f3 AS f0 + 1,
  f4 TIMESTAMP(3) NOT NULL,
  PRIMARY KEY (f0),
  UNIQUE (f3, f2),
  WATERMARK f4 AS f4 - INTERVAL '3' SECOND
) with (...)

+--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
| name | type                                                          |
null   | key        | compute column | watermark
 |
+--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
| f0       | BIGINT                                                     |
false | PRI       |  (NULL)               |   (NULL)
        |
+--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
| f1       | ROW<q1 STRING, q2 TIMESTAMP(3)> | true   | (NULL) |  (NULL)
           |  (NULL)                                 |
+--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
| f2       | VARCHAR<256>                                     | true   |
UNQ     |  (NULL)               |  (NULL)                                 |
+--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
| f3       | BIGINT                                                     |
false | UNQ     |  f0 + 1                  |  (NULL)
         |
+--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
| f4       | TIMESTAMP(3)                                        | false |
(NULL) |  (NULL)                | f4 - INTERVAL '3' SECOND |
+--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+

WDYT ?

Best,
Godfrey



godfrey he <[email protected]> 于2020年4月30日周四 下午11:57写道：

> Hi Fabian,
>
> the broken example is:
>
> create table MyTable (
>
>     f0 BIGINT NOT NULL,
>
>     f1 ROW<q1 STRING, q2 TIMESTAMP(3)>,
>
>     f2 VARCHAR<256>,
>
>     f3 AS f0 + 1,
>
>     PRIMARY KEY (f0),
>
> UNIQUE (f3, f2),
>
>     WATERMARK f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
>
> ) with (...)
>
>
> name
>
> type
>
> key
>
> compute column
>
> watermark
>
> f0
>
> BIGINT NOT NULL
>
> PRI
>
> (NULL)
>
> f1
>
> ROW<`q1` STRING, `q2` TIMESTAMP(3)>
>
> UNQ
>
> (NULL)
>
> f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
>
> f2
>
> VARCHAR<256>
>
> (NULL)
>
> NULL
>
> f3
>
> BIGINT NOT NULL
>
> UNQ
>
> f0 + 1
>
>
> or we add a column to represent nullability.
>
> name
>
> type
>
> null
>
> key
>
> compute column
>
> watermark
>
> f0
>
> BIGINT
>
> false
>
> PRI
>
> (NULL)
>
> f1
>
> ROW<`q1` STRING, `q2` TIMESTAMP(3)>
>
> true
>
> UNQ
>
> (NULL)
>
> f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
>
> f2
>
> VARCHAR<256>
>
> true
>
> (NULL)
>
> NULL
>
> f3
>
> BIGINT
>
> false
>
> UNQ
>
> f0 + 1
>
>
>
>
> Hi Jark,
> If we can limit watermark must be defined on top-level column,
> this will become more simple.
>
> Best,
> Godfrey
>
> Jark Wu <[email protected]> 于2020年4月30日周四 下午11:38写道：
>
>> Hi,
>>
>> I'm in favor of Fabian's proposal.
>> First, watermark is not a column, but a metadata just like primary key, so
>> shouldn't stand with columns.
>> Second, AFAIK, primary key can only be defined on top-level columns.
>> Third, I think watermark can also follow primary key than only allow to
>> define on top-level columns.
>>
>> I have to admit that in FLIP-66, watermark can define on nested fields.
>> However, during implementation, I found that it's too complicated to do
>> that. We have refactor time-based physical nodes,
>> we have to use code generation to access event-time, we have to refactor
>> FlinkTypeFactory to support a complex nested rowtime.
>> There is not much value of this feature, but introduce a lot of complexity
>> in code base.
>> So I think we can force watermark define on top-level columns. If user
>> want
>> to define on nested columns,
>> he/she can use computed column to be a top-level column.
>>
>> Best,
>> Jark
>>
>>
>> On Thu, 30 Apr 2020 at 17:55, Fabian Hueske <[email protected]> wrote:
>>
>> > Hi Godfrey,
>> >
>> > The formatting of your example seems to be broken.
>> > Could you send them again please?
>> >
>> > Regarding your points
>> > > because watermark express can be a sub-column, just like `f1.q2` in
>> above
>> > example I give.
>> >
>> > I would put the watermark information in the row of the top-level field
>> and
>> > indicate to which nested field the watermark refers.
>> > Don't we have to solve the same issue for primary keys that are defined
>> on
>> > a nested field?
>> >
>> > > A boolean flag can't represent such info. and I do know whether we
>> will
>> > support complex watermark expression involving multiple columns in the
>> > future. such as: "WATERMARK FOR ts as ts + f1 + interval '1' second"
>> >
>> > You are right, a simple binary flag is definitely not sufficient to
>> display
>> > the watermark information.
>> > I would put the expression string into the field, i.e., "ts + f1 +
>> interval
>> > '1' second"
>> >
>> >
>> > For me the most important point of why to not show the watermark as a
>> row
>> > in the table is that it is not field that can be queried but meta
>> > information on an existing field.
>> > For the user it is important to know that a certain field has a
>> watermark.
>> > Otherwise, certain queries cannot be correctly specified.
>> > Also there might be support for multiple watermarks that are defined of
>> > different fields at some point. Would those be printed in multiple rows?
>> >
>> > Best,
>> > Fabian
>> >
>> >
>> > Am Do., 30. Apr. 2020 um 11:25 Uhr schrieb godfrey he <
>> [email protected]
>> > >:
>> >
>> > > Hi Fabian, Aljoscha
>> > >
>> > > Thanks for the feedback.
>> > >
>> > > Agree with you that we can deal with primary key as you mentioned.
>> > > now, the type column has contained the nullability attribute, e.g.
>> BIGINT
>> > > NOT NULL.
>> > > (I'm also ok that we use two columns to represent type just like
>> mysql)
>> > >
>> > > >Why I treat `watermark` as a special row ?
>> > > because watermark express can be a sub-column, just like `f1.q2` in
>> above
>> > > example I give.
>> > > A boolean flag can't represent such info. and I do know whether we
>> will
>> > > support complex
>> > > watermark expression involving multiple columns in the future. such
>> as:
>> > > "WATERMARK FOR ts as ts + f1 + interval '1' second"
>> > >
>> > > If we do not support complex watermark expression, we can add a
>> watermark
>> > > column.
>> > >
>> > > for example:
>> > >
>> > > create table MyTable (
>> > >
>> > >     f0 BIGINT NOT NULL,
>> > >
>> > >     f1 ROW<q1 STRING, q2 TIMESTAMP(3)>,
>> > >
>> > >     f2 VARCHAR<256>,
>> > >
>> > >     f3 AS f0 + 1,
>> > >
>> > >     PRIMARY KEY (f0),
>> > >
>> > > UNIQUE (f3, f2),
>> > >
>> > >     WATERMARK f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
>> > >
>> > > ) with (...)
>> > >
>> > >
>> > > name
>> > >
>> > > type
>> > >
>> > > key
>> > >
>> > > compute column
>> > >
>> > > watermark
>> > >
>> > > f0
>> > >
>> > > BIGINT NOT NULL
>> > >
>> > > PRI
>> > >
>> > > (NULL)
>> > >
>> > > f1
>> > >
>> > > ROW<`q1` STRING, `q2` TIMESTAMP(3)>
>> > >
>> > > UNQ
>> > >
>> > > (NULL)
>> > >
>> > > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
>> > >
>> > > f2
>> > >
>> > > VARCHAR<256>
>> > >
>> > > (NULL)
>> > >
>> > > NULL
>> > >
>> > > f3
>> > >
>> > > BIGINT NOT NULL
>> > >
>> > > UNQ
>> > >
>> > > f0 + 1
>> > >
>> > >
>> > > or we add a column to represent nullability.
>> > >
>> > > name
>> > >
>> > > type
>> > >
>> > > null
>> > >
>> > > key
>> > >
>> > > compute column
>> > >
>> > > watermark
>> > >
>> > > f0
>> > >
>> > > BIGINT
>> > >
>> > > false
>> > >
>> > > PRI
>> > >
>> > > (NULL)
>> > >
>> > > f1
>> > >
>> > > ROW<`q1` STRING, `q2` TIMESTAMP(3)>
>> > >
>> > > true
>> > >
>> > > UNQ
>> > >
>> > > (NULL)
>> > >
>> > > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
>> > >
>> > > f2
>> > >
>> > > VARCHAR<256>
>> > >
>> > > true
>> > >
>> > > (NULL)
>> > >
>> > > NULL
>> > >
>> > > f3
>> > >
>> > > BIGINT
>> > >
>> > > false
>> > >
>> > > UNQ
>> > >
>> > > f0 + 1
>> > >
>> > >
>> > > Personally, I like the second one. (we need do some changes on
>> > LogicalType
>> > > to get type name without nullability)
>> > >
>> > >
>> > > Best,
>> > > Godfrey
>> > >
>> > >
>> > > Aljoscha Krettek <[email protected]> 于2020年4月29日周三 下午5:47写道：
>> > >
>> > > > +1 I like the general idea of printing the results as a table.
>> > > >
>> > > > On the specifics I don't know enough but Fabians suggestions seems
>> to
>> > > > make sense to me.
>> > > >
>> > > > Aljoscha
>> > > >
>> > > > On 29.04.20 10:56, Fabian Hueske wrote:
>> > > > > Hi Godfrey,
>> > > > >
>> > > > > Thanks for starting this discussion!
>> > > > >
>> > > > > In my mind, WATERMARK is a property (or constraint) of a field,
>> just
>> > > like
>> > > > > PRIMARY KEY.
>> > > > > Take this example from MySQL:
>> > > > >
>> > > > > mysql> CREATE TABLE people (id INT NOT NULL, name VARCHAR(128) NOT
>> > > NULL,
>> > > > > age INT, PRIMARY KEY (id));
>> > > > > Query OK, 0 rows affected (0.06 sec)
>> > > > >
>> > > > > mysql> describe people;
>> > > > > +-------+--------------+------+-----+---------+-------+
>> > > > > | Field | Type         | Null | Key | Default | Extra |
>> > > > > +-------+--------------+------+-----+---------+-------+
>> > > > > | id    | int          | NO   | PRI | NULL    |       |
>> > > > > | name  | varchar(128) | NO   |     | NULL    |       |
>> > > > > | age   | int          | YES  |     | NULL    |       |
>> > > > > +-------+--------------+------+-----+---------+-------+
>> > > > > 3 rows in set (0.01 sec)
>> > > > >
>> > > > > Here, PRIMARY KEY is marked in the Key column of the id field.
>> > > > > We could do the same for watermarks by adding a Watermark column.
>> > > > >
>> > > > > Best, Fabian
>> > > > >
>> > > > >
>> > > > > Am Mi., 29. Apr. 2020 um 10:43 Uhr schrieb godfrey he <
>> > > > [email protected]>:
>> > > > >
>> > > > >> Hi everyone,
>> > > > >>
>> > > > >> I would like to bring up a discussion about the result type of
>> > > describe
>> > > > >> statement,
>> > > > >> which is introduced in FLIP-84[1].
>> > > > >> In previous version, we define the result type of `describe`
>> > statement
>> > > > is a
>> > > > >> single column as following
>> > > > >>
>> > > > >> Statement
>> > > > >>
>> > > > >> Result Schema
>> > > > >>
>> > > > >> Result Value
>> > > > >>
>> > > > >> Result Kind
>> > > > >>
>> > > > >> Examples
>> > > > >>
>> > > > >> DESCRIBE xx
>> > > > >>
>> > > > >> field name: result
>> > > > >>
>> > > > >> field type: VARCHAR(n)
>> > > > >>
>> > > > >> (n is the max length of values)
>> > > > >>
>> > > > >> describe the detail of an object
>> > > > >>
>> > > > >> (single row)
>> > > > >>
>> > > > >> SUCCESS_WITH_CONTENT
>> > > > >>
>> > > > >> DESCRIBE table_name
>> > > > >>
>> > > > >> for "describe table_name", the result value is the `toString`
>> value
>> > of
>> > > > >> `TableSchema`, which is an unstructured data.
>> > > > >> It's hard to for user to use this info.
>> > > > >>
>> > > > >> for example:
>> > > > >>
>> > > > >> TableSchema schema = TableSchema.builder()
>> > > > >>     .field("f0", DataTypes.BIGINT())
>> > > > >>     .field("f1", DataTypes.ROW(
>> > > > >>        DataTypes.FIELD("q1", DataTypes.STRING()),
>> > > > >>        DataTypes.FIELD("q2", DataTypes.TIMESTAMP(3))))
>> > > > >>     .field("f2", DataTypes.STRING())
>> > > > >>     .field("f3", DataTypes.BIGINT(), "f0 + 1")
>> > > > >>     .watermark("f1.q2", WATERMARK_EXPRESSION, WATERMARK_DATATYPE)
>> > > > >>     .build();
>> > > > >>
>> > > > >> its `toString` value is:
>> > > > >> root
>> > > > >>   |-- f0: BIGINT
>> > > > >>   |-- f1: ROW<`q1` STRING, `q2` TIMESTAMP(3)>
>> > > > >>   |-- f2: STRING
>> > > > >>   |-- f3: BIGINT AS f0 + 1
>> > > > >>   |-- WATERMARK FOR f1.q2 AS now()
>> > > > >>
>> > > > >> For hive, MySQL, etc., the describe result is table form
>> including
>> > > field
>> > > > >> names and field types.
>> > > > >> which is more familiar with users.
>> > > > >> TableSchema[2] has watermark expression and compute column, we
>> > should
>> > > > also
>> > > > >> put them into the table:
>> > > > >> for compute column, it's a column level, we add a new column
>> named
>> > > > `expr`.
>> > > > >>   for watermark expression, it's a table level, we add a special
>> row
>> > > > named
>> > > > >> `WATERMARK` to represent it.
>> > > > >>
>> > > > >> The result will look like about above example:
>> > > > >>
>> > > > >> name
>> > > > >>
>> > > > >> type
>> > > > >>
>> > > > >> expr
>> > > > >>
>> > > > >> f0
>> > > > >>
>> > > > >> BIGINT
>> > > > >>
>> > > > >> (NULL)
>> > > > >>
>> > > > >> f1
>> > > > >>
>> > > > >> ROW<`q1` STRING, `q2` TIMESTAMP(3)>
>> > > > >>
>> > > > >> (NULL)
>> > > > >>
>> > > > >> f2
>> > > > >>
>> > > > >> STRING
>> > > > >>
>> > > > >> NULL
>> > > > >>
>> > > > >> f3
>> > > > >>
>> > > > >> BIGINT
>> > > > >>
>> > > > >> f0 + 1
>> > > > >>
>> > > > >> WATERMARK
>> > > > >>
>> > > > >> (NULL)
>> > > > >>
>> > > > >> f1.q2 AS now()
>> > > > >>
>> > > > >> now there is a pr FLINK-17112 [3] to implement DESCRIBE
>> statement.
>> > > > >>
>> > > > >> What do you think about this update?
>> > > > >> Any feedback are welcome~
>> > > > >>
>> > > > >> Best,
>> > > > >> Godfrey
>> > > > >>
>> > > > >> [1]
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>> > > > >> [2]
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/api/TableSchema.java
>> > > > >> [3] https://github.com/apache/flink/pull/11892
>> > > > >>
>> > > > >>
>> > > > >> godfrey he <[email protected]> 于2020年4月6日周一 下午10:38写道：
>> > > > >>
>> > > > >>> Hi Timo,
>> > > > >>>
>> > > > >>> Sorry for the late reply, and thanks for your correction.
>> > > > >>> I missed DQL for job submission scenario.
>> > > > >>> I'll fix the document right away.
>> > > > >>>
>> > > > >>> Best,
>> > > > >>> Godfrey
>> > > > >>>
>> > > > >>> Timo Walther <[email protected]> 于2020年4月3日周五 下午9:53写道：
>> > > > >>>
>> > > > >>>> Hi Godfrey,
>> > > > >>>>
>> > > > >>>> I'm sorry to jump in again but I still need to clarify some
>> things
>> > > > >>>> around TableResult.
>> > > > >>>>
>> > > > >>>> The FLIP says:
>> > > > >>>> "For DML, this method returns TableResult until the job is
>> > > submitted.
>> > > > >>>> For other statements, TableResult is returned until the
>> execution
>> > is
>> > > > >>>> finished."
>> > > > >>>>
>> > > > >>>> I thought we agreed on making every execution async? This also
>> > means
>> > > > >>>> returning a TableResult for DQLs even though the execution is
>> not
>> > > done
>> > > > >>>> yet. People need access to the JobClient also for batch jobs in
>> > > order
>> > > > to
>> > > > >>>> cancel long lasting queries. If people want to wait for the
>> > > completion
>> > > > >>>> they can hook into JobClient or collect().
>> > > > >>>>
>> > > > >>>> Can we rephrase this part to:
>> > > > >>>>
>> > > > >>>> The FLIP says:
>> > > > >>>> "For DML and DQL, this method returns TableResult once the job
>> has
>> > > > been
>> > > > >>>> submitted. For DDL and DCL statements, TableResult is returned
>> > once
>> > > > the
>> > > > >>>> operation has finished."
>> > > > >>>>
>> > > > >>>> Regards,
>> > > > >>>> Timo
>> > > > >>>>
>> > > > >>>>
>> > > > >>>> On 02.04.20 05:27, godfrey he wrote:
>> > > > >>>>> Hi Aljoscha, Dawid, Timo,
>> > > > >>>>>
>> > > > >>>>> Thanks so much for the detailed explanation.
>> > > > >>>>> Agree with you that the multiline story is not completed now,
>> and
>> > > we
>> > > > >> can
>> > > > >>>>> keep discussion.
>> > > > >>>>> I will add current discussions and conclusions to the FLIP.
>> > > > >>>>>
>> > > > >>>>> Best,
>> > > > >>>>> Godfrey
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>> Timo Walther <[email protected]> 于2020年4月1日周三 下午11:27写道：
>> > > > >>>>>
>> > > > >>>>>> Hi Godfrey,
>> > > > >>>>>>
>> > > > >>>>>> first of all, I agree with Dawid. The multiline story is not
>> > > > >> completed
>> > > > >>>>>> by this FLIP. It just verifies the big picture.
>> > > > >>>>>>
>> > > > >>>>>> 1. "control the execution logic through the proposed method
>> if
>> > > they
>> > > > >>>> know
>> > > > >>>>>> what the statements are"
>> > > > >>>>>>
>> > > > >>>>>> This is a good point that also Fabian raised in the linked
>> > google
>> > > > >> doc.
>> > > > >>>> I
>> > > > >>>>>> could also imagine to return a more complicated POJO when
>> > calling
>> > > > >>>>>> `executeMultiSql()`.
>> > > > >>>>>>
>> > > > >>>>>> The POJO would include some `getSqlProperties()` such that a
>> > > > platform
>> > > > >>>>>> gets insights into the query before executing. We could also
>> > > trigger
>> > > > >>>> the
>> > > > >>>>>> execution more explicitly instead of hiding it behind an
>> > iterator.
>> > > > >>>>>>
>> > > > >>>>>> 2. "there are some special commands introduced in SQL client"
>> > > > >>>>>>
>> > > > >>>>>> For platforms and SQL Client specific commands, we could
>> offer a
>> > > > hook
>> > > > >>>> to
>> > > > >>>>>> the parser or a fallback parser in case the regular table
>> > > > environment
>> > > > >>>>>> parser cannot deal with the statement.
>> > > > >>>>>>
>> > > > >>>>>> However, all of that is future work and can be discussed in a
>> > > > >> separate
>> > > > >>>>>> FLIP.
>> > > > >>>>>>
>> > > > >>>>>> 3. +1 for the `Iterator` instead of `Iterable`.
>> > > > >>>>>>
>> > > > >>>>>> 4. "we should convert the checked exception to unchecked
>> > > exception"
>> > > > >>>>>>
>> > > > >>>>>> Yes, I meant using a runtime exception instead of a checked
>> > > > >> exception.
>> > > > >>>>>> There was no consensus on putting the exception into the
>> > > > >> `TableResult`.
>> > > > >>>>>>
>> > > > >>>>>> Regards,
>> > > > >>>>>> Timo
>> > > > >>>>>>
>> > > > >>>>>> On 01.04.20 15:35, Dawid Wysakowicz wrote:
>> > > > >>>>>>> When considering the multi-line support I think it is
>> helpful
>> > to
>> > > > >> start
>> > > > >>>>>>> with a use case in mind. In my opinion consumers of this
>> method
>> > > > will
>> > > > >>>> be:
>> > > > >>>>>>>
>> > > > >>>>>>>    1. sql-client
>> > > > >>>>>>>    2. third-part sql based platforms
>> > > > >>>>>>>
>> > > > >>>>>>> @Godfrey As for the quit/source/... commands. I think those
>> > > belong
>> > > > >> to
>> > > > >>>>>>> the responsibility of aforementioned. I think they should
>> not
>> > be
>> > > > >>>>>>> understandable by the TableEnvironment. What would quit on a
>> > > > >>>>>>> TableEnvironment do? Moreover I think such commands should
>> be
>> > > > >> prefixed
>> > > > >>>>>>> appropriately. I think it's a common practice to e.g. prefix
>> > > those
>> > > > >>>> with
>> > > > >>>>>>> ! or : to say they are meta commands of the tool rather
>> than a
>> > > > >> query.
>> > > > >>>>>>>
>> > > > >>>>>>> I also don't necessarily understand why platform users need
>> to
>> > > know
>> > > > >>>> the
>> > > > >>>>>>> kind of the query to use the proposed method. They should
>> get
>> > the
>> > > > >> type
>> > > > >>>>>>> from the TableResult#ResultKind. If the ResultKind is
>> SUCCESS,
>> > it
>> > > > >> was
>> > > > >>>> a
>> > > > >>>>>>> DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If that's
>> > not
>> > > > >>>> enough
>> > > > >>>>>>> we can enrich the TableResult with more explicit kind of
>> query,
>> > > but
>> > > > >> so
>> > > > >>>>>>> far I don't see such a need.
>> > > > >>>>>>>
>> > > > >>>>>>> @Kurt In those cases I would assume the developers want to
>> > > present
>> > > > >>>>>>> results of the queries anyway. Moreover I think it is safe
>> to
>> > > > assume
>> > > > >>>>>>> they can adhere to such a contract that the results must be
>> > > > >> iterated.
>> > > > >>>>>>>
>> > > > >>>>>>> For direct users of TableEnvironment/Table API this method
>> does
>> > > not
>> > > > >>>> make
>> > > > >>>>>>> much sense anyway, in my opinion. I think we can rather
>> safely
>> > > > >> assume
>> > > > >>>> in
>> > > > >>>>>>> this scenario they do not want to submit multiple queries
>> at a
>> > > > >> single
>> > > > >>>>>> time.
>> > > > >>>>>>>
>> > > > >>>>>>> Best,
>> > > > >>>>>>>
>> > > > >>>>>>> Dawid
>> > > > >>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>> On 01/04/2020 15:07, Kurt Young wrote:
>> > > > >>>>>>>> One comment to `executeMultilineSql`, I'm afraid sometimes
>> > user
>> > > > >> might
>> > > > >>>>>>>> forget to
>> > > > >>>>>>>> iterate the returned iterators, e.g. user submits a bunch
>> of
>> > > DDLs
>> > > > >> and
>> > > > >>>>>>>> expect the
>> > > > >>>>>>>> framework will execute them one by one. But it didn't.
>> > > > >>>>>>>>
>> > > > >>>>>>>> Best,
>> > > > >>>>>>>> Kurt
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>> On Wed, Apr 1, 2020 at 5:10 PM Aljoscha Krettek<
>> > > > >> [email protected]>
>> > > > >>>>>> wrote:
>> > > > >>>>>>>>
>> > > > >>>>>>>>> Agreed to what Dawid and Timo said.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> To answer your question about multi line SQL: no, we don't
>> > > think
>> > > > >> we
>> > > > >>>>>> need
>> > > > >>>>>>>>> this in Flink 1.11, we only wanted to make sure that the
>> > > > >> interfaces
>> > > > >>>>>> that
>> > > > >>>>>>>>> we now put in place will potentially allow this in the
>> > future.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> Best,
>> > > > >>>>>>>>> Aljoscha
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> On 01.04.20 09:31, godfrey he wrote:
>> > > > >>>>>>>>>> Hi, Timo & Dawid,
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> Thanks so much for the effort of `multiline statements
>> > > > >> supporting`,
>> > > > >>>>>>>>>> I have a few questions about this method:
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> 1. users can well control the execution logic through the
>> > > > >> proposed
>> > > > >>>>>> method
>> > > > >>>>>>>>>>      if they know what the statements are (a statement
>> is a
>> > > > DDL, a
>> > > > >>>> DML
>> > > > >>>>>> or
>> > > > >>>>>>>>>> others).
>> > > > >>>>>>>>>> but if a statement is from a file, that means users do
>> not
>> > > know
>> > > > >>>> what
>> > > > >>>>>> the
>> > > > >>>>>>>>>> statements are,
>> > > > >>>>>>>>>> the execution behavior is unclear.
>> > > > >>>>>>>>>> As a platform user, I think this method is hard to use,
>> > unless
>> > > > >> the
>> > > > >>>>>>>>> platform
>> > > > >>>>>>>>>> defines
>> > > > >>>>>>>>>> a set of rule about the statements order, such as: no
>> select
>> > > in
>> > > > >> the
>> > > > >>>>>>>>> middle,
>> > > > >>>>>>>>>> dml must be at tail of sql file (which may be the most
>> case
>> > in
>> > > > >>>> product
>> > > > >>>>>>>>>> env).
>> > > > >>>>>>>>>> Otherwise the platform must parse the sql first, then
>> know
>> > > what
>> > > > >> the
>> > > > >>>>>>>>>> statements are.
>> > > > >>>>>>>>>> If do like that, the platform can handle all cases
>> through
>> > > > >>>>>> `executeSql`
>> > > > >>>>>>>>> and
>> > > > >>>>>>>>>> `StatementSet`.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> 2. SQL client can't also use `executeMultilineSql` to
>> > supports
>> > > > >>>>>> multiline
>> > > > >>>>>>>>>> statements,
>> > > > >>>>>>>>>>      because there are some special commands introduced
>> in
>> > SQL
>> > > > >>>> client,
>> > > > >>>>>>>>>> such as `quit`, `source`, `load jar` (not exist now, but
>> > maybe
>> > > > we
>> > > > >>>> need
>> > > > >>>>>>>>> this
>> > > > >>>>>>>>>> command
>> > > > >>>>>>>>>>      to support dynamic table source and udf).
>> > > > >>>>>>>>>> Does TableEnvironment also supports those commands?
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> 3. btw, we must have this feature in release-1.11? I find
>> > > there
>> > > > >> are
>> > > > >>>>>> few
>> > > > >>>>>>>>>> user cases
>> > > > >>>>>>>>>>      in the feedback document which behavior is unclear
>> now.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> regarding to "change the return value from
>> `Iterable<Row` to
>> > > > >>>>>>>>>> `Iterator<Row`",
>> > > > >>>>>>>>>> I couldn't agree more with this change. Just as Dawid
>> > > mentioned
>> > > > >>>>>>>>>> "The contract of the Iterable#iterator is that it
>> returns a
>> > > new
>> > > > >>>>>> iterator
>> > > > >>>>>>>>>> each time,
>> > > > >>>>>>>>>>      which effectively means we can iterate the results
>> > > multiple
>> > > > >>>>>> times.",
>> > > > >>>>>>>>>> we does not provide iterate the results multiple times.
>> > > > >>>>>>>>>> If we want do that, the client must buffer all results.
>> but
>> > > it's
>> > > > >>>>>>>>> impossible
>> > > > >>>>>>>>>> for streaming job.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> Best,
>> > > > >>>>>>>>>> Godfrey
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> Dawid Wysakowicz<[email protected]>  于2020年4月1日周三
>> > > > 上午3:14写道：
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>>> Thank you Timo for the great summary! It covers (almost)
>> > all
>> > > > the
>> > > > >>>>>> topics.
>> > > > >>>>>>>>>>> Even though in the end we are not suggesting much
>> changes
>> > to
>> > > > the
>> > > > >>>>>> current
>> > > > >>>>>>>>>>> state of FLIP I think it is important to lay out all
>> > possible
>> > > > >> use
>> > > > >>>>>> cases
>> > > > >>>>>>>>>>> so that we do not change the execution model every
>> release.
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> There is one additional thing we discussed. Could we
>> change
>> > > the
>> > > > >>>>>> result
>> > > > >>>>>>>>>>> type of TableResult#collect to Iterator<Row>? Even
>> though
>> > > those
>> > > > >>>>>>>>>>> interfaces do not differ much. I think Iterator better
>> > > > describes
>> > > > >>>> that
>> > > > >>>>>>>>>>> the results might not be materialized on the client
>> side,
>> > but
>> > > > >> can
>> > > > >>>> be
>> > > > >>>>>>>>>>> retrieved on a per record basis. The contract of the
>> > > > >>>>>> Iterable#iterator
>> > > > >>>>>>>>>>> is that it returns a new iterator each time, which
>> > > effectively
>> > > > >>>> means
>> > > > >>>>>> we
>> > > > >>>>>>>>>>> can iterate the results multiple times. Iterating the
>> > results
>> > > > is
>> > > > >>>> not
>> > > > >>>>>>>>>>> possible when we don't retrieve all the results from the
>> > > > cluster
>> > > > >>>> at
>> > > > >>>>>>>>> once.
>> > > > >>>>>>>>>>> I think we should also use Iterator for
>> > > > >>>>>>>>>>> TableEnvironment#executeMultilineSql(String statements):
>> > > > >>>>>>>>>>> Iterator<TableResult>.
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> Best,
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> Dawid
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> On 31/03/2020 19:27, Timo Walther wrote:
>> > > > >>>>>>>>>>>> Hi Godfrey,
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> Aljoscha, Dawid, Klou, and I had another discussion
>> around
>> > > > >>>> FLIP-84.
>> > > > >>>>>> In
>> > > > >>>>>>>>>>>> particular, we discussed how the current status of the
>> > FLIP
>> > > > and
>> > > > >>>> the
>> > > > >>>>>>>>>>>> future requirements around multiline statements,
>> > async/sync,
>> > > > >>>>>> collect()
>> > > > >>>>>>>>>>>> fit together.
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> We also updated the FLIP-84 Feedback Summary document
>> [1]
>> > > with
>> > > > >>>> some
>> > > > >>>>>>>>>>>> use cases.
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> We believe that we found a good solution that also
>> fits to
>> > > > what
>> > > > >>>> is
>> > > > >>>>>> in
>> > > > >>>>>>>>>>>> the current FLIP. So no bigger changes necessary,
>> which is
>> > > > >> great!
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> Our findings were:
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> 1. Async vs sync submission of Flink jobs:
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> Having a blocking `execute()` in DataStream API was
>> > rather a
>> > > > >>>>>> mistake.
>> > > > >>>>>>>>>>>> Instead all submissions should be async because this
>> > allows
>> > > > >>>>>> supporting
>> > > > >>>>>>>>>>>> both modes if necessary. Thus, submitting all queries
>> > async
>> > > > >>>> sounds
>> > > > >>>>>>>>>>>> good to us. If users want to run a job sync, they can
>> use
>> > > the
>> > > > >>>>>>>>>>>> JobClient and wait for completion (or collect() in
>> case of
>> > > > >> batch
>> > > > >>>>>> jobs).
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> 2. Multi-statement execution:
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> For the multi-statement execution, we don't see a
>> > > > >> contradication
>> > > > >>>>>> with
>> > > > >>>>>>>>>>>> the async execution behavior. We imagine a method like:
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> TableEnvironment#executeMultilineSql(String
>> statements):
>> > > > >>>>>>>>>>>> Iterable<TableResult>
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> Where the `Iterator#next()` method would trigger the
>> next
>> > > > >>>> statement
>> > > > >>>>>>>>>>>> submission. This allows a caller to decide
>> synchronously
>> > > when
>> > > > >> to
>> > > > >>>>>>>>>>>> submit statements async to the cluster. Thus, a service
>> > such
>> > > > as
>> > > > >>>> the
>> > > > >>>>>>>>>>>> SQL Client can handle the result of each statement
>> > > > individually
>> > > > >>>> and
>> > > > >>>>>>>>>>>> process statement by statement sequentially.
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> 3. The role of TableResult and result retrieval in
>> general
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> `TableResult` is similar to `JobClient`. Instead of
>> > > returning
>> > > > a
>> > > > >>>>>>>>>>>> `CompletableFuture` of something, it is a concrete util
>> > > class
>> > > > >>>> where
>> > > > >>>>>>>>>>>> some methods have the behavior of completable future
>> (e.g.
>> > > > >>>>>> collect(),
>> > > > >>>>>>>>>>>> print()) and some are already completed
>> (getTableSchema(),
>> > > > >>>>>>>>>>>> getResultKind()).
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> `StatementSet#execute()` returns a single `TableResult`
>> > > > because
>> > > > >>>> the
>> > > > >>>>>>>>>>>> order is undefined in a set and all statements have the
>> > same
>> > > > >>>> schema.
>> > > > >>>>>>>>>>>> Its `collect()` will return a row for each executed
>> > `INSERT
>> > > > >>>> INTO` in
>> > > > >>>>>>>>>>>> the order of statement definition.
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> For simple `SELECT * FROM ...`, the query execution
>> might
>> > > > block
>> > > > >>>>>> until
>> > > > >>>>>>>>>>>> `collect()` is called to pull buffered rows from the
>> job
>> > > (from
>> > > > >>>>>>>>>>>> socket/REST API what ever we will use in the future).
>> We
>> > can
>> > > > >> say
>> > > > >>>>>> that
>> > > > >>>>>>>>>>>> a statement finished successfully, when the
>> > > > >>>>>> `collect#Iterator#hasNext`
>> > > > >>>>>>>>>>>> has returned false.
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> I hope this summarizes our discussion
>> > @Dawid/Aljoscha/Klou?
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> It would be great if we can add these findings to the
>> FLIP
>> > > > >>>> before we
>> > > > >>>>>>>>>>>> start voting.
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> One minor thing: some `execute()` methods still throw a
>> > > > checked
>> > > > >>>>>>>>>>>> exception; can we remove that from the FLIP? Also the
>> > above
>> > > > >>>>>> mentioned
>> > > > >>>>>>>>>>>> `Iterator#next()` would trigger an execution without
>> > > throwing
>> > > > a
>> > > > >>>>>>>>>>>> checked exception.
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> Thanks,
>> > > > >>>>>>>>>>>> Timo
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> [1]
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>
>> > > > >>>>>>
>> > > > >>>>
>> > > > >>
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#
>> > > > >>>>>>>>>>>> On 31.03.20 06:28, godfrey he wrote:
>> > > > >>>>>>>>>>>>> Hi, Timo & Jark
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> Thanks for your explanation.
>> > > > >>>>>>>>>>>>> Agree with you that async execution should always be
>> > async,
>> > > > >>>>>>>>>>>>> and sync execution scenario can be covered  by async
>> > > > >> execution.
>> > > > >>>>>>>>>>>>> It helps provide an unified entry point for batch and
>> > > > >> streaming.
>> > > > >>>>>>>>>>>>> I think we can also use sync execution for some
>> testing.
>> > > > >>>>>>>>>>>>> So, I agree with you that we provide `executeSql`
>> method
>> > > and
>> > > > >>>> it's
>> > > > >>>>>>>>> async
>> > > > >>>>>>>>>>>>> method.
>> > > > >>>>>>>>>>>>> If we want sync method in the future, we can add
>> method
>> > > named
>> > > > >>>>>>>>>>>>> `executeSqlSync`.
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> I think we've reached an agreement. I will update the
>> > > > >> document,
>> > > > >>>> and
>> > > > >>>>>>>>>>>>> start
>> > > > >>>>>>>>>>>>> voting process.
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> Best,
>> > > > >>>>>>>>>>>>> Godfrey
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> Jark Wu<[email protected]>  于2020年3月31日周二 上午12:46写道：
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>> Hi,
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>> I didn't follow the full discussion.
>> > > > >>>>>>>>>>>>>> But I share the same concern with Timo that streaming
>> > > > queries
>> > > > >>>>>> should
>> > > > >>>>>>>>>>>>>> always
>> > > > >>>>>>>>>>>>>> be async.
>> > > > >>>>>>>>>>>>>> Otherwise, I can image it will cause a lot of
>> confusion
>> > > and
>> > > > >>>>>> problems
>> > > > >>>>>>>>> if
>> > > > >>>>>>>>>>>>>> users don't deeply keep the "sync" in mind (e.g.
>> client
>> > > > >> hangs).
>> > > > >>>>>>>>>>>>>> Besides, the streaming mode is still the majority use
>> > > cases
>> > > > >> of
>> > > > >>>>>> Flink
>> > > > >>>>>>>>>>>>>> and
>> > > > >>>>>>>>>>>>>> Flink SQL. We should put the usability at a high
>> > priority.
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>> Best,
>> > > > >>>>>>>>>>>>>> Jark
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>> On Mon, 30 Mar 2020 at 23:27, Timo Walther<
>> > > > >> [email protected]>
>> > > > >>>>>>>>> wrote:
>> > > > >>>>>>>>>>>>>>> Hi Godfrey,
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> maybe I wasn't expressing my biggest concern enough
>> in
>> > my
>> > > > >> last
>> > > > >>>>>> mail.
>> > > > >>>>>>>>>>>>>>> Even in a singleline and sync execution, I think
>> that
>> > > > >>>> streaming
>> > > > >>>>>>>>>>>>>>> queries
>> > > > >>>>>>>>>>>>>>> should not block the execution. Otherwise it is not
>> > > > possible
>> > > > >>>> to
>> > > > >>>>>> call
>> > > > >>>>>>>>>>>>>>> collect() or print() on them afterwards.
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> "there are too many things need to discuss for
>> > > multiline":
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> True, I don't want to solve all of them right now.
>> But
>> > > what
>> > > > >> I
>> > > > >>>>>> know
>> > > > >>>>>>>>> is
>> > > > >>>>>>>>>>>>>>> that our newly introduced methods should fit into a
>> > > > >> multiline
>> > > > >>>>>>>>>>>>>>> execution.
>> > > > >>>>>>>>>>>>>>> There is no big difference of calling
>> `executeSql(A),
>> > > > >>>>>>>>>>>>>>> executeSql(B)` and
>> > > > >>>>>>>>>>>>>>> processing a multiline file `A;\nB;`.
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> I think the example that you mentioned can simply be
>> > > > >> undefined
>> > > > >>>>>> for
>> > > > >>>>>>>>>>>>>>> now.
>> > > > >>>>>>>>>>>>>>> Currently, no catalog is modifying data but just
>> > > metadata.
>> > > > >>>> This
>> > > > >>>>>> is a
>> > > > >>>>>>>>>>>>>>> separate discussion.
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> "result of the second statement is indeterministic":
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> Sure this is indeterministic. But this is the
>> > > implementers
>> > > > >>>> fault
>> > > > >>>>>>>>>>>>>>> and we
>> > > > >>>>>>>>>>>>>>> cannot forbid such pipelines.
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> How about we always execute streaming queries
>> async? It
>> > > > >> would
>> > > > >>>>>>>>> unblock
>> > > > >>>>>>>>>>>>>>> executeSql() and multiline statements.
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> Having a `executeSqlAsync()` is useful for batch.
>> > > However,
>> > > > I
>> > > > >>>>>> don't
>> > > > >>>>>>>>>>>>>>> want
>> > > > >>>>>>>>>>>>>>> `sync/async` be the new batch/stream flag. The
>> > execution
>> > > > >>>> behavior
>> > > > >>>>>>>>>>>>>>> should
>> > > > >>>>>>>>>>>>>>> come from the query itself.
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> Regards,
>> > > > >>>>>>>>>>>>>>> Timo
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> On 30.03.20 11:12, godfrey he wrote:
>> > > > >>>>>>>>>>>>>>>> Hi Timo,
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> Agree with you that streaming queries is our top
>> > > priority,
>> > > > >>>>>>>>>>>>>>>> but I think there are too many things need to
>> discuss
>> > > for
>> > > > >>>>>> multiline
>> > > > >>>>>>>>>>>>>>>> statements:
>> > > > >>>>>>>>>>>>>>>> e.g.
>> > > > >>>>>>>>>>>>>>>> 1. what's the behaivor of DDL and DML mixing for
>> async
>> > > > >>>>>> execution:
>> > > > >>>>>>>>>>>>>>>> create table t1 xxx;
>> > > > >>>>>>>>>>>>>>>> create table t2 xxx;
>> > > > >>>>>>>>>>>>>>>> insert into t2 select * from t1 where xxx;
>> > > > >>>>>>>>>>>>>>>> drop table t1; // t1 may be a MySQL table, the data
>> > will
>> > > > >>>> also be
>> > > > >>>>>>>>>>>>>> deleted.
>> > > > >>>>>>>>>>>>>>>> t1 is dropped when "insert" job is running.
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> 2. what's the behaivor of unified scenario for
>> async
>> > > > >>>> execution:
>> > > > >>>>>>>>>>>>>>>> (as you
>> > > > >>>>>>>>>>>>>>>> mentioned)
>> > > > >>>>>>>>>>>>>>>> INSERT INTO t1 SELECT * FROM s;
>> > > > >>>>>>>>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> The result of the second statement is
>> indeterministic,
>> > > > >>>> because
>> > > > >>>>>> the
>> > > > >>>>>>>>>>>>>> first
>> > > > >>>>>>>>>>>>>>>> statement maybe is running.
>> > > > >>>>>>>>>>>>>>>> I think we need to put a lot of effort to define
>> the
>> > > > >>>> behavior of
>> > > > >>>>>>>>>>>>>>> logically
>> > > > >>>>>>>>>>>>>>>> related queries.
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> In this FLIP, I suggest we only handle single
>> > statement,
>> > > > >> and
>> > > > >>>> we
>> > > > >>>>>>>>> also
>> > > > >>>>>>>>>>>>>>>> introduce an async execute method
>> > > > >>>>>>>>>>>>>>>> which is more important and more often used for
>> users.
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> Dor the sync methods (like
>> > `TableEnvironment.executeSql`
>> > > > >> and
>> > > > >>>>>>>>>>>>>>>> `StatementSet.execute`),
>> > > > >>>>>>>>>>>>>>>> the result will be returned until the job is
>> finished.
>> > > The
>> > > > >>>>>>>>> following
>> > > > >>>>>>>>>>>>>>>> methods will be introduced in this FLIP:
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>        /**
>> > > > >>>>>>>>>>>>>>>>         * Asynchronously execute the given single
>> > > > statement
>> > > > >>>>>>>>>>>>>>>>         */
>> > > > >>>>>>>>>>>>>>>> TableEnvironment.executeSqlAsync(String statement):
>> > > > >>>> TableResult
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> /**
>> > > > >>>>>>>>>>>>>>>>        * Asynchronously execute the dml statements
>> as
>> > a
>> > > > >> batch
>> > > > >>>>>>>>>>>>>>>>        */
>> > > > >>>>>>>>>>>>>>>> StatementSet.executeAsync(): TableResult
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> public interface TableResult {
>> > > > >>>>>>>>>>>>>>>>          /**
>> > > > >>>>>>>>>>>>>>>>           * return JobClient for DQL and DML in
>> async
>> > > > mode,
>> > > > >>>> else
>> > > > >>>>>>>>> return
>> > > > >>>>>>>>>>>>>>>> Optional.empty
>> > > > >>>>>>>>>>>>>>>>           */
>> > > > >>>>>>>>>>>>>>>>          Optional<JobClient> getJobClient();
>> > > > >>>>>>>>>>>>>>>> }
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> what do you think?
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> Best,
>> > > > >>>>>>>>>>>>>>>> Godfrey
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> Timo Walther<[email protected]>  于2020年3月26日周四
>> > > 下午9:15写道：
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> Hi Godfrey,
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> executing streaming queries must be our top
>> priority
>> > > > >> because
>> > > > >>>>>> this
>> > > > >>>>>>>>> is
>> > > > >>>>>>>>>>>>>>>>> what distinguishes Flink from competitors. If we
>> > change
>> > > > >> the
>> > > > >>>>>>>>>>>>>>>>> execution
>> > > > >>>>>>>>>>>>>>>>> behavior, we should think about the other cases as
>> > well
>> > > > to
>> > > > >>>> not
>> > > > >>>>>>>>> break
>> > > > >>>>>>>>>>>>>> the
>> > > > >>>>>>>>>>>>>>>>> API a third time.
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> I fear that just having an async execute method
>> will
>> > > not
>> > > > >> be
>> > > > >>>>>> enough
>> > > > >>>>>>>>>>>>>>>>> because users should be able to mix streaming and
>> > batch
>> > > > >>>> queries
>> > > > >>>>>>>>> in a
>> > > > >>>>>>>>>>>>>>>>> unified scenario.
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> If I remember it correctly, we had some
>> discussions
>> > in
>> > > > the
>> > > > >>>> past
>> > > > >>>>>>>>>>>>>>>>> about
>> > > > >>>>>>>>>>>>>>>>> what decides about the execution mode of a query.
>> > > > >>>> Currently, we
>> > > > >>>>>>>>>>>>>>>>> would
>> > > > >>>>>>>>>>>>>>>>> like to let the query decide, not derive it from
>> the
>> > > > >>>> sources.
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> So I could image a multiline pipeline as:
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> USE CATALOG 'mycat';
>> > > > >>>>>>>>>>>>>>>>> INSERT INTO t1 SELECT * FROM s;
>> > > > >>>>>>>>>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT
>> STREAM;
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> For executeMultilineSql():
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> sync because regular SQL
>> > > > >>>>>>>>>>>>>>>>> sync because regular Batch SQL
>> > > > >>>>>>>>>>>>>>>>> async because Streaming SQL
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> For executeAsyncMultilineSql():
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> async because everything should be async
>> > > > >>>>>>>>>>>>>>>>> async because everything should be async
>> > > > >>>>>>>>>>>>>>>>> async because everything should be async
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> What we should not start for
>> > > executeAsyncMultilineSql():
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> sync because DDL
>> > > > >>>>>>>>>>>>>>>>> async because everything should be async
>> > > > >>>>>>>>>>>>>>>>> async because everything should be async
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> What are you thoughts here?
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> Regards,
>> > > > >>>>>>>>>>>>>>>>> Timo
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> On 26.03.20 12:50, godfrey he wrote:
>> > > > >>>>>>>>>>>>>>>>>> Hi Timo,
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> I agree with you that streaming queries mostly
>> need
>> > > > async
>> > > > >>>>>>>>>>>>>>>>>> execution.
>> > > > >>>>>>>>>>>>>>>>>> In fact, our original plan is only introducing
>> sync
>> > > > >>>> methods in
>> > > > >>>>>>>>> this
>> > > > >>>>>>>>>>>>>>> FLIP,
>> > > > >>>>>>>>>>>>>>>>>> and async methods (like "executeSqlAsync") will
>> be
>> > > > >>>> introduced
>> > > > >>>>>> in
>> > > > >>>>>>>>>>>>>>>>>> the
>> > > > >>>>>>>>>>>>>>>>> future
>> > > > >>>>>>>>>>>>>>>>>> which is mentioned in the appendix.
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> Maybe the async methods also need to be
>> considered
>> > in
>> > > > >> this
>> > > > >>>>>> FLIP.
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> I think sync methods is also useful for streaming
>> > > which
>>
>

Re: [DISCUSS] FLIP-84 Feedback Summary

Reply via email to