Re: [DISCUSS] FLIP-84 Feedback Summary

Kurt Young Sun, 17 May 2020 00:42:28 -0700

+1 from my side.

Best,
Kurt



On Sun, May 17, 2020 at 12:41 PM godfrey he <[email protected]> wrote:

> hi everyone,
>
> I would like to bring up another topic about the return value of
> TableResult#collect() method.
> Currently, the return type is `Iterator<Row>`, we meet some problems when
> implementing FLINK-14807[1].
>
> In current design, the sink operator has a buffer pool which buffers the
> data from upstream,
> and waits the client to consume the data. The client will pull the data
> when `Iterator<Row>#next()` method is called.
> If the client submits a select job, consumes a part of data and exits. The
> job will not be finished.
> This will cause resource leak. We can't require the client must consume all
> data. for unbounded stream job, it's also impossible.
> Currently, users can also cancel the job via
> `TableResult.getJobClient().get().cancel()` method.
> But this approach is not intuitive and convenient.
>
> So, I want to change the return type from `Iterator<Row>` to
> `CloseableRowIterator`,
> the new method likes like:
>
> public interface TableResult {
>
>   CloseableRowIterator collect();
>
> }
>
> public interface CloseableRowIterator extends Iterator<Row>, AutoCloseable
> {
>
> }
>
> Prefixing the name with "Closeable" is intended to remind the users that
> this iterator should be closed,
> users can conveniently use try-with-resources statement to close the
> resources.
> The resource leak problem is still there if users do not close the iterator
> or cancel the job through job client,
>  we just provide an easier way for users to avoid this.
>
> I also notice that there is a `CloseableIterator` interface in
> `org.apache.flink.util` package.
> But I still tend to introduce `CloseableRowIterator`. My point of view is:
> 1) `CloseableIterator` is in a util package, not a public interface.
> 2) `CloseableRowIterator` is more convenient, users do not need to define
> generic type `<Row>`.
>
> What do you think?
>
> Best,
> Godfrey
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-14807
>
>
> Fabian Hueske <[email protected]> 于2020年5月7日周四 下午3:59写道：
>
> > Thanks for the update Godfrey!
> >
> > +1 to this approach.
> >
> > Since there can be only one primary key, I'd also be fine to just use
> > `PRI` even if it is composite, but `PRI(f0, f5)` might be more convenient
> > for users.
> >
> > Thanks, Fabian
> >
> > Am Do., 7. Mai 2020 um 09:31 Uhr schrieb godfrey he <[email protected]
> >:
> >
> >> Hi fabian,
> >> Thanks for you suggestions.
> >>
> >> Agree with you that `UNQ(f2, f3)` is more clear.
> >>
> >> A table can have only ONE primary key,
> >> this primary key can consist of single or multiple columns. [1]
> >> if primary key consists of single column,
> >> we can simply use `PRI` (or `PRI(xx)`) to represent it.
> >> if primary key have multiple columns,
> >> we should use `PRI(xx, yy, ...)` to represent it.
> >>
> >> A table may have multiple unique keys,
> >> each unique key can consist of single or multiple columns. [2]
> >> if there is only one unique key and this unique key has only single
> >> column,
> >> we can simply use `UNQ` (or `UNQ(xx)`) to represent it.
> >> otherwise, we should use `UNQ(xx, yy, ...)` to represent it.
> >> (a corner case: two unique keys with same columns, like `UNQ(f2, f3)`,
> >> `UNQ(f2, f3)`,
> >> we can forbid this case or add a unique name for each key in the future)
> >>
> >> primary key and unique key with multiple columns example:
> >> create table MyTable (
> >>   f0 BIGINT NOT NULL,
> >>   f1 ROW<q1 STRING, q2 TIMESTAMP(3)>,
> >>   f2 VARCHAR<256>,
> >>   f3 AS f0 + 1,
> >>   f4 TIMESTAMP(3) NOT NULL,
> >>   f5 BIGINT NOT NULL,
> >>  * PRIMARY KEY (f0, f5)*,
> >>   *UNIQUE (f3, f2)*,
> >>   WATERMARK f4 AS f4 - INTERVAL '3' SECOND
> >> ) with (...)
> >>
> >>
> >>
> +--------+------------------------------------------------------+-------+----------------+-----------------------+--------------------------------------+
> >> | name | type                                                          |
> >> null   | key              | compute column | watermark
> >>        |
> >>
> >>
> +--------+------------------------------------------------------+-------+----------------+-----------------------+--------------------------------------+
> >> | f0       | BIGINT
>  |
> >> false | PRI(f0, f5)   |  (NULL)               |   (NULL)
> >>             |
> >>
> >>
> +--------+------------------------------------------------------+-------+----------------+-----------------------+--------------------------------------+
> >> | f1       | ROW<q1 STRING, q2 TIMESTAMP(3)> | true   | (NULL)        |
> >> (NULL)               |  (NULL)                                 |
> >>
> >>
> +--------+------------------------------------------------------+-------+----------------+-----------------------+--------------------------------------+
> >> | f2       | VARCHAR<256>                                     | true   |
> >> UNQ(f2, f3) |  (NULL)               |  (NULL)
> >>    |
> >>
> >>
> +--------+------------------------------------------------------+-------+----------------+-----------------------+--------------------------------------+
> >> | f3       | BIGINT
>  |
> >> false | UNQ(f2, f3) |  f0 + 1                  |  (NULL)
> >>              |
> >>
> >>
> +--------+------------------------------------------------------+-------+----------------+-----------------------+--------------------------------------+
> >> | f4       | TIMESTAMP(3)                                        | false
> >> | (NULL)        |  (NULL)               | f4 - INTERVAL '3' SECOND |
> >>
> >>
> +--------+------------------------------------------------------+-------+----------------+-----------------------+--------------------------------------+
> >> | f5       | BIGINT
>  |
> >> false | PRI(f0, f5)   |  (NULL)               |   (NULL)
> >>             |
> >>
> >>
> +--------+------------------------------------------------------+-------+----------------+-----------------------+--------------------------------------+
> >>
> >> "Regarding to the watermark on nested columns", that's a good approach
> >> which can both support watermark on nested columns in the future and
> keep
> >> current table form.
> >>
> >> [1] https://www.w3schools.com/sql/sql_primarykey.asp
> >> [2] https://www.w3schools.com/sql/sql_unique.ASP
> >>
> >> Best,
> >> Godfrey
> >>
> >> Fabian Hueske <[email protected]> 于2020年5月7日周四 上午12:03写道：
> >>
> >>> Hi Godfrey,
> >>>
> >>> This looks good to me.
> >>>
> >>> One side note, indicating unique constraints with "UNQ" is probably not
> >>> enough.
> >>> There might be multiple unique constraints and users would like to know
> >>> which field combinations are unique.
> >>> So in your example above, "UNQ(f2, f3)" might be a better marker.
> >>>
> >>> Just as a thought, if we would later add support for watermark on
> nested
> >>> columns, we could add a row just for the nested field (in addition to
> the
> >>> top-level field) like this:
> >>>
> >>>
> >>>
> +------------------------+---------------------------+-------+-----------+-------------+-----------------------------------------------------------+
> >>> | f4.nested.rowtime | TIMESTAMP(3)        | false | (NULL) |  (NULL)
> >>> | f4.nested.rowtime - INTERVAL '3' SECOND |
> >>>
> >>>
> +------------------------+---------------------------+-------+-----------+-------------+-----------------------------------------------------------+
> >>>
> >>> Thanks,
> >>> Fabian
> >>>
> >>> Am Mi., 6. Mai 2020 um 17:51 Uhr schrieb godfrey he <
> [email protected]
> >>> >:
> >>>
> >>>> Hi @[email protected] @Timo Walther <[email protected]>  @Dawid
> >>>> Wysakowicz <[email protected]>
> >>>> What do you think we limit watermark must be defined on top-level
> >>>> column ?
> >>>>
> >>>> if we do that, we can add an expression column to represent watermark
> >>>> like compute column,
> >>>> An example of all cases:
> >>>> create table MyTable (
> >>>>   f0 BIGINT NOT NULL,
> >>>>   f1 ROW<q1 STRING, q2 TIMESTAMP(3)>,
> >>>>   f2 VARCHAR<256>,
> >>>>   f3 AS f0 + 1,
> >>>>   f4 TIMESTAMP(3) NOT NULL,
> >>>>   PRIMARY KEY (f0),
> >>>>   UNIQUE (f3, f2),
> >>>>   WATERMARK f4 AS f4 - INTERVAL '3' SECOND
> >>>> ) with (...)
> >>>>
> >>>>
> >>>>
> +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
> >>>> | name | type
> >>>> | null   | key        | compute column | watermark
> >>>>    |
> >>>>
> >>>>
> +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
> >>>> | f0       | BIGINT
> >>>>  | false | PRI       |  (NULL)               |   (NULL)
> >>>>             |
> >>>>
> >>>>
> +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
> >>>> | f1       | ROW<q1 STRING, q2 TIMESTAMP(3)> | true   | (NULL) |
> >>>> (NULL)               |  (NULL)                                 |
> >>>>
> >>>>
> +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
> >>>> | f2       | VARCHAR<256>                                     | true
> >>>>  | UNQ     |  (NULL)               |  (NULL)
> >>>>  |
> >>>>
> >>>>
> +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
> >>>> | f3       | BIGINT
> >>>>  | false | UNQ     |  f0 + 1                  |  (NULL)
> >>>>              |
> >>>>
> >>>>
> +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
> >>>> | f4       | TIMESTAMP(3)                                        |
> >>>> false | (NULL) |  (NULL)                | f4 - INTERVAL '3' SECOND |
> >>>>
> >>>>
> +--------+------------------------------------------------------+-------+-----------+-----------------------+--------------------------------------+
> >>>>
> >>>> WDYT ?
> >>>>
> >>>> Best,
> >>>> Godfrey
> >>>>
> >>>>
> >>>>
> >>>> godfrey he <[email protected]> 于2020年4月30日周四 下午11:57写道：
> >>>>
> >>>>> Hi Fabian,
> >>>>>
> >>>>> the broken example is:
> >>>>>
> >>>>> create table MyTable (
> >>>>>
> >>>>>     f0 BIGINT NOT NULL,
> >>>>>
> >>>>>     f1 ROW<q1 STRING, q2 TIMESTAMP(3)>,
> >>>>>
> >>>>>     f2 VARCHAR<256>,
> >>>>>
> >>>>>     f3 AS f0 + 1,
> >>>>>
> >>>>>     PRIMARY KEY (f0),
> >>>>>
> >>>>> UNIQUE (f3, f2),
> >>>>>
> >>>>>     WATERMARK f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
> >>>>>
> >>>>> ) with (...)
> >>>>>
> >>>>>
> >>>>> name
> >>>>>
> >>>>> type
> >>>>>
> >>>>> key
> >>>>>
> >>>>> compute column
> >>>>>
> >>>>> watermark
> >>>>>
> >>>>> f0
> >>>>>
> >>>>> BIGINT NOT NULL
> >>>>>
> >>>>> PRI
> >>>>>
> >>>>> (NULL)
> >>>>>
> >>>>> f1
> >>>>>
> >>>>> ROW<`q1` STRING, `q2` TIMESTAMP(3)>
> >>>>>
> >>>>> UNQ
> >>>>>
> >>>>> (NULL)
> >>>>>
> >>>>> f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
> >>>>>
> >>>>> f2
> >>>>>
> >>>>> VARCHAR<256>
> >>>>>
> >>>>> (NULL)
> >>>>>
> >>>>> NULL
> >>>>>
> >>>>> f3
> >>>>>
> >>>>> BIGINT NOT NULL
> >>>>>
> >>>>> UNQ
> >>>>>
> >>>>> f0 + 1
> >>>>>
> >>>>>
> >>>>> or we add a column to represent nullability.
> >>>>>
> >>>>> name
> >>>>>
> >>>>> type
> >>>>>
> >>>>> null
> >>>>>
> >>>>> key
> >>>>>
> >>>>> compute column
> >>>>>
> >>>>> watermark
> >>>>>
> >>>>> f0
> >>>>>
> >>>>> BIGINT
> >>>>>
> >>>>> false
> >>>>>
> >>>>> PRI
> >>>>>
> >>>>> (NULL)
> >>>>>
> >>>>> f1
> >>>>>
> >>>>> ROW<`q1` STRING, `q2` TIMESTAMP(3)>
> >>>>>
> >>>>> true
> >>>>>
> >>>>> UNQ
> >>>>>
> >>>>> (NULL)
> >>>>>
> >>>>> f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
> >>>>>
> >>>>> f2
> >>>>>
> >>>>> VARCHAR<256>
> >>>>>
> >>>>> true
> >>>>>
> >>>>> (NULL)
> >>>>>
> >>>>> NULL
> >>>>>
> >>>>> f3
> >>>>>
> >>>>> BIGINT
> >>>>>
> >>>>> false
> >>>>>
> >>>>> UNQ
> >>>>>
> >>>>> f0 + 1
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Hi Jark,
> >>>>> If we can limit watermark must be defined on top-level column,
> >>>>> this will become more simple.
> >>>>>
> >>>>> Best,
> >>>>> Godfrey
> >>>>>
> >>>>> Jark Wu <[email protected]> 于2020年4月30日周四 下午11:38写道：
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm in favor of Fabian's proposal.
> >>>>>> First, watermark is not a column, but a metadata just like primary
> >>>>>> key, so
> >>>>>> shouldn't stand with columns.
> >>>>>> Second, AFAIK, primary key can only be defined on top-level columns.
> >>>>>> Third, I think watermark can also follow primary key than only allow
> >>>>>> to
> >>>>>> define on top-level columns.
> >>>>>>
> >>>>>> I have to admit that in FLIP-66, watermark can define on nested
> >>>>>> fields.
> >>>>>> However, during implementation, I found that it's too complicated to
> >>>>>> do
> >>>>>> that. We have refactor time-based physical nodes,
> >>>>>> we have to use code generation to access event-time, we have to
> >>>>>> refactor
> >>>>>> FlinkTypeFactory to support a complex nested rowtime.
> >>>>>> There is not much value of this feature, but introduce a lot of
> >>>>>> complexity
> >>>>>> in code base.
> >>>>>> So I think we can force watermark define on top-level columns. If
> >>>>>> user want
> >>>>>> to define on nested columns,
> >>>>>> he/she can use computed column to be a top-level column.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>>
> >>>>>> On Thu, 30 Apr 2020 at 17:55, Fabian Hueske <[email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>> > Hi Godfrey,
> >>>>>> >
> >>>>>> > The formatting of your example seems to be broken.
> >>>>>> > Could you send them again please?
> >>>>>> >
> >>>>>> > Regarding your points
> >>>>>> > > because watermark express can be a sub-column, just like `f1.q2`
> >>>>>> in above
> >>>>>> > example I give.
> >>>>>> >
> >>>>>> > I would put the watermark information in the row of the top-level
> >>>>>> field and
> >>>>>> > indicate to which nested field the watermark refers.
> >>>>>> > Don't we have to solve the same issue for primary keys that are
> >>>>>> defined on
> >>>>>> > a nested field?
> >>>>>> >
> >>>>>> > > A boolean flag can't represent such info. and I do know whether
> >>>>>> we will
> >>>>>> > support complex watermark expression involving multiple columns in
> >>>>>> the
> >>>>>> > future. such as: "WATERMARK FOR ts as ts + f1 + interval '1'
> second"
> >>>>>> >
> >>>>>> > You are right, a simple binary flag is definitely not sufficient
> to
> >>>>>> display
> >>>>>> > the watermark information.
> >>>>>> > I would put the expression string into the field, i.e., "ts + f1 +
> >>>>>> interval
> >>>>>> > '1' second"
> >>>>>> >
> >>>>>> >
> >>>>>> > For me the most important point of why to not show the watermark
> as
> >>>>>> a row
> >>>>>> > in the table is that it is not field that can be queried but meta
> >>>>>> > information on an existing field.
> >>>>>> > For the user it is important to know that a certain field has a
> >>>>>> watermark.
> >>>>>> > Otherwise, certain queries cannot be correctly specified.
> >>>>>> > Also there might be support for multiple watermarks that are
> >>>>>> defined of
> >>>>>> > different fields at some point. Would those be printed in multiple
> >>>>>> rows?
> >>>>>> >
> >>>>>> > Best,
> >>>>>> > Fabian
> >>>>>> >
> >>>>>> >
> >>>>>> > Am Do., 30. Apr. 2020 um 11:25 Uhr schrieb godfrey he <
> >>>>>> [email protected]
> >>>>>> > >:
> >>>>>> >
> >>>>>> > > Hi Fabian, Aljoscha
> >>>>>> > >
> >>>>>> > > Thanks for the feedback.
> >>>>>> > >
> >>>>>> > > Agree with you that we can deal with primary key as you
> mentioned.
> >>>>>> > > now, the type column has contained the nullability attribute,
> >>>>>> e.g. BIGINT
> >>>>>> > > NOT NULL.
> >>>>>> > > (I'm also ok that we use two columns to represent type just like
> >>>>>> mysql)
> >>>>>> > >
> >>>>>> > > >Why I treat `watermark` as a special row ?
> >>>>>> > > because watermark express can be a sub-column, just like `f1.q2`
> >>>>>> in above
> >>>>>> > > example I give.
> >>>>>> > > A boolean flag can't represent such info. and I do know whether
> >>>>>> we will
> >>>>>> > > support complex
> >>>>>> > > watermark expression involving multiple columns in the future.
> >>>>>> such as:
> >>>>>> > > "WATERMARK FOR ts as ts + f1 + interval '1' second"
> >>>>>> > >
> >>>>>> > > If we do not support complex watermark expression, we can add a
> >>>>>> watermark
> >>>>>> > > column.
> >>>>>> > >
> >>>>>> > > for example:
> >>>>>> > >
> >>>>>> > > create table MyTable (
> >>>>>> > >
> >>>>>> > >     f0 BIGINT NOT NULL,
> >>>>>> > >
> >>>>>> > >     f1 ROW<q1 STRING, q2 TIMESTAMP(3)>,
> >>>>>> > >
> >>>>>> > >     f2 VARCHAR<256>,
> >>>>>> > >
> >>>>>> > >     f3 AS f0 + 1,
> >>>>>> > >
> >>>>>> > >     PRIMARY KEY (f0),
> >>>>>> > >
> >>>>>> > > UNIQUE (f3, f2),
> >>>>>> > >
> >>>>>> > >     WATERMARK f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
> >>>>>> > >
> >>>>>> > > ) with (...)
> >>>>>> > >
> >>>>>> > >
> >>>>>> > > name
> >>>>>> > >
> >>>>>> > > type
> >>>>>> > >
> >>>>>> > > key
> >>>>>> > >
> >>>>>> > > compute column
> >>>>>> > >
> >>>>>> > > watermark
> >>>>>> > >
> >>>>>> > > f0
> >>>>>> > >
> >>>>>> > > BIGINT NOT NULL
> >>>>>> > >
> >>>>>> > > PRI
> >>>>>> > >
> >>>>>> > > (NULL)
> >>>>>> > >
> >>>>>> > > f1
> >>>>>> > >
> >>>>>> > > ROW<`q1` STRING, `q2` TIMESTAMP(3)>
> >>>>>> > >
> >>>>>> > > UNQ
> >>>>>> > >
> >>>>>> > > (NULL)
> >>>>>> > >
> >>>>>> > > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
> >>>>>> > >
> >>>>>> > > f2
> >>>>>> > >
> >>>>>> > > VARCHAR<256>
> >>>>>> > >
> >>>>>> > > (NULL)
> >>>>>> > >
> >>>>>> > > NULL
> >>>>>> > >
> >>>>>> > > f3
> >>>>>> > >
> >>>>>> > > BIGINT NOT NULL
> >>>>>> > >
> >>>>>> > > UNQ
> >>>>>> > >
> >>>>>> > > f0 + 1
> >>>>>> > >
> >>>>>> > >
> >>>>>> > > or we add a column to represent nullability.
> >>>>>> > >
> >>>>>> > > name
> >>>>>> > >
> >>>>>> > > type
> >>>>>> > >
> >>>>>> > > null
> >>>>>> > >
> >>>>>> > > key
> >>>>>> > >
> >>>>>> > > compute column
> >>>>>> > >
> >>>>>> > > watermark
> >>>>>> > >
> >>>>>> > > f0
> >>>>>> > >
> >>>>>> > > BIGINT
> >>>>>> > >
> >>>>>> > > false
> >>>>>> > >
> >>>>>> > > PRI
> >>>>>> > >
> >>>>>> > > (NULL)
> >>>>>> > >
> >>>>>> > > f1
> >>>>>> > >
> >>>>>> > > ROW<`q1` STRING, `q2` TIMESTAMP(3)>
> >>>>>> > >
> >>>>>> > > true
> >>>>>> > >
> >>>>>> > > UNQ
> >>>>>> > >
> >>>>>> > > (NULL)
> >>>>>> > >
> >>>>>> > > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
> >>>>>> > >
> >>>>>> > > f2
> >>>>>> > >
> >>>>>> > > VARCHAR<256>
> >>>>>> > >
> >>>>>> > > true
> >>>>>> > >
> >>>>>> > > (NULL)
> >>>>>> > >
> >>>>>> > > NULL
> >>>>>> > >
> >>>>>> > > f3
> >>>>>> > >
> >>>>>> > > BIGINT
> >>>>>> > >
> >>>>>> > > false
> >>>>>> > >
> >>>>>> > > UNQ
> >>>>>> > >
> >>>>>> > > f0 + 1
> >>>>>> > >
> >>>>>> > >
> >>>>>> > > Personally, I like the second one. (we need do some changes on
> >>>>>> > LogicalType
> >>>>>> > > to get type name without nullability)
> >>>>>> > >
> >>>>>> > >
> >>>>>> > > Best,
> >>>>>> > > Godfrey
> >>>>>> > >
> >>>>>> > >
> >>>>>> > > Aljoscha Krettek <[email protected]> 于2020年4月29日周三 下午5:47写道：
> >>>>>> > >
> >>>>>> > > > +1 I like the general idea of printing the results as a table.
> >>>>>> > > >
> >>>>>> > > > On the specifics I don't know enough but Fabians suggestions
> >>>>>> seems to
> >>>>>> > > > make sense to me.
> >>>>>> > > >
> >>>>>> > > > Aljoscha
> >>>>>> > > >
> >>>>>> > > > On 29.04.20 10:56, Fabian Hueske wrote:
> >>>>>> > > > > Hi Godfrey,
> >>>>>> > > > >
> >>>>>> > > > > Thanks for starting this discussion!
> >>>>>> > > > >
> >>>>>> > > > > In my mind, WATERMARK is a property (or constraint) of a
> >>>>>> field, just
> >>>>>> > > like
> >>>>>> > > > > PRIMARY KEY.
> >>>>>> > > > > Take this example from MySQL:
> >>>>>> > > > >
> >>>>>> > > > > mysql> CREATE TABLE people (id INT NOT NULL, name
> >>>>>> VARCHAR(128) NOT
> >>>>>> > > NULL,
> >>>>>> > > > > age INT, PRIMARY KEY (id));
> >>>>>> > > > > Query OK, 0 rows affected (0.06 sec)
> >>>>>> > > > >
> >>>>>> > > > > mysql> describe people;
> >>>>>> > > > > +-------+--------------+------+-----+---------+-------+
> >>>>>> > > > > | Field | Type         | Null | Key | Default | Extra |
> >>>>>> > > > > +-------+--------------+------+-----+---------+-------+
> >>>>>> > > > > | id    | int          | NO   | PRI | NULL    |       |
> >>>>>> > > > > | name  | varchar(128) | NO   |     | NULL    |       |
> >>>>>> > > > > | age   | int          | YES  |     | NULL    |       |
> >>>>>> > > > > +-------+--------------+------+-----+---------+-------+
> >>>>>> > > > > 3 rows in set (0.01 sec)
> >>>>>> > > > >
> >>>>>> > > > > Here, PRIMARY KEY is marked in the Key column of the id
> field.
> >>>>>> > > > > We could do the same for watermarks by adding a Watermark
> >>>>>> column.
> >>>>>> > > > >
> >>>>>> > > > > Best, Fabian
> >>>>>> > > > >
> >>>>>> > > > >
> >>>>>> > > > > Am Mi., 29. Apr. 2020 um 10:43 Uhr schrieb godfrey he <
> >>>>>> > > > [email protected]>:
> >>>>>> > > > >
> >>>>>> > > > >> Hi everyone,
> >>>>>> > > > >>
> >>>>>> > > > >> I would like to bring up a discussion about the result type
> >>>>>> of
> >>>>>> > > describe
> >>>>>> > > > >> statement,
> >>>>>> > > > >> which is introduced in FLIP-84[1].
> >>>>>> > > > >> In previous version, we define the result type of
> `describe`
> >>>>>> > statement
> >>>>>> > > > is a
> >>>>>> > > > >> single column as following
> >>>>>> > > > >>
> >>>>>> > > > >> Statement
> >>>>>> > > > >>
> >>>>>> > > > >> Result Schema
> >>>>>> > > > >>
> >>>>>> > > > >> Result Value
> >>>>>> > > > >>
> >>>>>> > > > >> Result Kind
> >>>>>> > > > >>
> >>>>>> > > > >> Examples
> >>>>>> > > > >>
> >>>>>> > > > >> DESCRIBE xx
> >>>>>> > > > >>
> >>>>>> > > > >> field name: result
> >>>>>> > > > >>
> >>>>>> > > > >> field type: VARCHAR(n)
> >>>>>> > > > >>
> >>>>>> > > > >> (n is the max length of values)
> >>>>>> > > > >>
> >>>>>> > > > >> describe the detail of an object
> >>>>>> > > > >>
> >>>>>> > > > >> (single row)
> >>>>>> > > > >>
> >>>>>> > > > >> SUCCESS_WITH_CONTENT
> >>>>>> > > > >>
> >>>>>> > > > >> DESCRIBE table_name
> >>>>>> > > > >>
> >>>>>> > > > >> for "describe table_name", the result value is the
> >>>>>> `toString` value
> >>>>>> > of
> >>>>>> > > > >> `TableSchema`, which is an unstructured data.
> >>>>>> > > > >> It's hard to for user to use this info.
> >>>>>> > > > >>
> >>>>>> > > > >> for example:
> >>>>>> > > > >>
> >>>>>> > > > >> TableSchema schema = TableSchema.builder()
> >>>>>> > > > >>     .field("f0", DataTypes.BIGINT())
> >>>>>> > > > >>     .field("f1", DataTypes.ROW(
> >>>>>> > > > >>        DataTypes.FIELD("q1", DataTypes.STRING()),
> >>>>>> > > > >>        DataTypes.FIELD("q2", DataTypes.TIMESTAMP(3))))
> >>>>>> > > > >>     .field("f2", DataTypes.STRING())
> >>>>>> > > > >>     .field("f3", DataTypes.BIGINT(), "f0 + 1")
> >>>>>> > > > >>     .watermark("f1.q2", WATERMARK_EXPRESSION,
> >>>>>> WATERMARK_DATATYPE)
> >>>>>> > > > >>     .build();
> >>>>>> > > > >>
> >>>>>> > > > >> its `toString` value is:
> >>>>>> > > > >> root
> >>>>>> > > > >>   |-- f0: BIGINT
> >>>>>> > > > >>   |-- f1: ROW<`q1` STRING, `q2` TIMESTAMP(3)>
> >>>>>> > > > >>   |-- f2: STRING
> >>>>>> > > > >>   |-- f3: BIGINT AS f0 + 1
> >>>>>> > > > >>   |-- WATERMARK FOR f1.q2 AS now()
> >>>>>> > > > >>
> >>>>>> > > > >> For hive, MySQL, etc., the describe result is table form
> >>>>>> including
> >>>>>> > > field
> >>>>>> > > > >> names and field types.
> >>>>>> > > > >> which is more familiar with users.
> >>>>>> > > > >> TableSchema[2] has watermark expression and compute column,
> >>>>>> we
> >>>>>> > should
> >>>>>> > > > also
> >>>>>> > > > >> put them into the table:
> >>>>>> > > > >> for compute column, it's a column level, we add a new
> column
> >>>>>> named
> >>>>>> > > > `expr`.
> >>>>>> > > > >>   for watermark expression, it's a table level, we add a
> >>>>>> special row
> >>>>>> > > > named
> >>>>>> > > > >> `WATERMARK` to represent it.
> >>>>>> > > > >>
> >>>>>> > > > >> The result will look like about above example:
> >>>>>> > > > >>
> >>>>>> > > > >> name
> >>>>>> > > > >>
> >>>>>> > > > >> type
> >>>>>> > > > >>
> >>>>>> > > > >> expr
> >>>>>> > > > >>
> >>>>>> > > > >> f0
> >>>>>> > > > >>
> >>>>>> > > > >> BIGINT
> >>>>>> > > > >>
> >>>>>> > > > >> (NULL)
> >>>>>> > > > >>
> >>>>>> > > > >> f1
> >>>>>> > > > >>
> >>>>>> > > > >> ROW<`q1` STRING, `q2` TIMESTAMP(3)>
> >>>>>> > > > >>
> >>>>>> > > > >> (NULL)
> >>>>>> > > > >>
> >>>>>> > > > >> f2
> >>>>>> > > > >>
> >>>>>> > > > >> STRING
> >>>>>> > > > >>
> >>>>>> > > > >> NULL
> >>>>>> > > > >>
> >>>>>> > > > >> f3
> >>>>>> > > > >>
> >>>>>> > > > >> BIGINT
> >>>>>> > > > >>
> >>>>>> > > > >> f0 + 1
> >>>>>> > > > >>
> >>>>>> > > > >> WATERMARK
> >>>>>> > > > >>
> >>>>>> > > > >> (NULL)
> >>>>>> > > > >>
> >>>>>> > > > >> f1.q2 AS now()
> >>>>>> > > > >>
> >>>>>> > > > >> now there is a pr FLINK-17112 [3] to implement DESCRIBE
> >>>>>> statement.
> >>>>>> > > > >>
> >>>>>> > > > >> What do you think about this update?
> >>>>>> > > > >> Any feedback are welcome~
> >>>>>> > > > >>
> >>>>>> > > > >> Best,
> >>>>>> > > > >> Godfrey
> >>>>>> > > > >>
> >>>>>> > > > >> [1]
> >>>>>> > > > >>
> >>>>>> > > >
> >>>>>> > >
> >>>>>> >
> >>>>>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>> > > > >> [2]
> >>>>>> > > > >>
> >>>>>> > > > >>
> >>>>>> > > >
> >>>>>> > >
> >>>>>> >
> >>>>>>
> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/api/TableSchema.java
> >>>>>> > > > >> [3] https://github.com/apache/flink/pull/11892
> >>>>>> > > > >>
> >>>>>> > > > >>
> >>>>>> > > > >> godfrey he <[email protected]> 于2020年4月6日周一 下午10:38写道：
> >>>>>> > > > >>
> >>>>>> > > > >>> Hi Timo,
> >>>>>> > > > >>>
> >>>>>> > > > >>> Sorry for the late reply, and thanks for your correction.
> >>>>>> > > > >>> I missed DQL for job submission scenario.
> >>>>>> > > > >>> I'll fix the document right away.
> >>>>>> > > > >>>
> >>>>>> > > > >>> Best,
> >>>>>> > > > >>> Godfrey
> >>>>>> > > > >>>
> >>>>>> > > > >>> Timo Walther <[email protected]> 于2020年4月3日周五 下午9:53写道：
> >>>>>> > > > >>>
> >>>>>> > > > >>>> Hi Godfrey,
> >>>>>> > > > >>>>
> >>>>>> > > > >>>> I'm sorry to jump in again but I still need to clarify
> >>>>>> some things
> >>>>>> > > > >>>> around TableResult.
> >>>>>> > > > >>>>
> >>>>>> > > > >>>> The FLIP says:
> >>>>>> > > > >>>> "For DML, this method returns TableResult until the job
> is
> >>>>>> > > submitted.
> >>>>>> > > > >>>> For other statements, TableResult is returned until the
> >>>>>> execution
> >>>>>> > is
> >>>>>> > > > >>>> finished."
> >>>>>> > > > >>>>
> >>>>>> > > > >>>> I thought we agreed on making every execution async? This
> >>>>>> also
> >>>>>> > means
> >>>>>> > > > >>>> returning a TableResult for DQLs even though the
> execution
> >>>>>> is not
> >>>>>> > > done
> >>>>>> > > > >>>> yet. People need access to the JobClient also for batch
> >>>>>> jobs in
> >>>>>> > > order
> >>>>>> > > > to
> >>>>>> > > > >>>> cancel long lasting queries. If people want to wait for
> the
> >>>>>> > > completion
> >>>>>> > > > >>>> they can hook into JobClient or collect().
> >>>>>> > > > >>>>
> >>>>>> > > > >>>> Can we rephrase this part to:
> >>>>>> > > > >>>>
> >>>>>> > > > >>>> The FLIP says:
> >>>>>> > > > >>>> "For DML and DQL, this method returns TableResult once
> the
> >>>>>> job has
> >>>>>> > > > been
> >>>>>> > > > >>>> submitted. For DDL and DCL statements, TableResult is
> >>>>>> returned
> >>>>>> > once
> >>>>>> > > > the
> >>>>>> > > > >>>> operation has finished."
> >>>>>> > > > >>>>
> >>>>>> > > > >>>> Regards,
> >>>>>> > > > >>>> Timo
> >>>>>> > > > >>>>
> >>>>>> > > > >>>>
> >>>>>> > > > >>>> On 02.04.20 05:27, godfrey he wrote:
> >>>>>> > > > >>>>> Hi Aljoscha, Dawid, Timo,
> >>>>>> > > > >>>>>
> >>>>>> > > > >>>>> Thanks so much for the detailed explanation.
> >>>>>> > > > >>>>> Agree with you that the multiline story is not completed
> >>>>>> now, and
> >>>>>> > > we
> >>>>>> > > > >> can
> >>>>>> > > > >>>>> keep discussion.
> >>>>>> > > > >>>>> I will add current discussions and conclusions to the
> >>>>>> FLIP.
> >>>>>> > > > >>>>>
> >>>>>> > > > >>>>> Best,
> >>>>>> > > > >>>>> Godfrey
> >>>>>> > > > >>>>>
> >>>>>> > > > >>>>>
> >>>>>> > > > >>>>>
> >>>>>> > > > >>>>> Timo Walther <[email protected]> 于2020年4月1日周三
> 下午11:27写道：
> >>>>>> > > > >>>>>
> >>>>>> > > > >>>>>> Hi Godfrey,
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>>> first of all, I agree with Dawid. The multiline story
> is
> >>>>>> not
> >>>>>> > > > >> completed
> >>>>>> > > > >>>>>> by this FLIP. It just verifies the big picture.
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>>> 1. "control the execution logic through the proposed
> >>>>>> method if
> >>>>>> > > they
> >>>>>> > > > >>>> know
> >>>>>> > > > >>>>>> what the statements are"
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>>> This is a good point that also Fabian raised in the
> >>>>>> linked
> >>>>>> > google
> >>>>>> > > > >> doc.
> >>>>>> > > > >>>> I
> >>>>>> > > > >>>>>> could also imagine to return a more complicated POJO
> when
> >>>>>> > calling
> >>>>>> > > > >>>>>> `executeMultiSql()`.
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>>> The POJO would include some `getSqlProperties()` such
> >>>>>> that a
> >>>>>> > > > platform
> >>>>>> > > > >>>>>> gets insights into the query before executing. We could
> >>>>>> also
> >>>>>> > > trigger
> >>>>>> > > > >>>> the
> >>>>>> > > > >>>>>> execution more explicitly instead of hiding it behind
> an
> >>>>>> > iterator.
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>>> 2. "there are some special commands introduced in SQL
> >>>>>> client"
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>>> For platforms and SQL Client specific commands, we
> could
> >>>>>> offer a
> >>>>>> > > > hook
> >>>>>> > > > >>>> to
> >>>>>> > > > >>>>>> the parser or a fallback parser in case the regular
> table
> >>>>>> > > > environment
> >>>>>> > > > >>>>>> parser cannot deal with the statement.
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>>> However, all of that is future work and can be
> discussed
> >>>>>> in a
> >>>>>> > > > >> separate
> >>>>>> > > > >>>>>> FLIP.
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>>> 3. +1 for the `Iterator` instead of `Iterable`.
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>>> 4. "we should convert the checked exception to
> unchecked
> >>>>>> > > exception"
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>>> Yes, I meant using a runtime exception instead of a
> >>>>>> checked
> >>>>>> > > > >> exception.
> >>>>>> > > > >>>>>> There was no consensus on putting the exception into
> the
> >>>>>> > > > >> `TableResult`.
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>>> Regards,
> >>>>>> > > > >>>>>> Timo
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>>> On 01.04.20 15:35, Dawid Wysakowicz wrote:
> >>>>>> > > > >>>>>>> When considering the multi-line support I think it is
> >>>>>> helpful
> >>>>>> > to
> >>>>>> > > > >> start
> >>>>>> > > > >>>>>>> with a use case in mind. In my opinion consumers of
> >>>>>> this method
> >>>>>> > > > will
> >>>>>> > > > >>>> be:
> >>>>>> > > > >>>>>>>
> >>>>>> > > > >>>>>>>    1. sql-client
> >>>>>> > > > >>>>>>>    2. third-part sql based platforms
> >>>>>> > > > >>>>>>>
> >>>>>> > > > >>>>>>> @Godfrey As for the quit/source/... commands. I think
> >>>>>> those
> >>>>>> > > belong
> >>>>>> > > > >> to
> >>>>>> > > > >>>>>>> the responsibility of aforementioned. I think they
> >>>>>> should not
> >>>>>> > be
> >>>>>> > > > >>>>>>> understandable by the TableEnvironment. What would
> quit
> >>>>>> on a
> >>>>>> > > > >>>>>>> TableEnvironment do? Moreover I think such commands
> >>>>>> should be
> >>>>>> > > > >> prefixed
> >>>>>> > > > >>>>>>> appropriately. I think it's a common practice to e.g.
> >>>>>> prefix
> >>>>>> > > those
> >>>>>> > > > >>>> with
> >>>>>> > > > >>>>>>> ! or : to say they are meta commands of the tool
> rather
> >>>>>> than a
> >>>>>> > > > >> query.
> >>>>>> > > > >>>>>>>
> >>>>>> > > > >>>>>>> I also don't necessarily understand why platform users
> >>>>>> need to
> >>>>>> > > know
> >>>>>> > > > >>>> the
> >>>>>> > > > >>>>>>> kind of the query to use the proposed method. They
> >>>>>> should get
> >>>>>> > the
> >>>>>> > > > >> type
> >>>>>> > > > >>>>>>> from the TableResult#ResultKind. If the ResultKind is
> >>>>>> SUCCESS,
> >>>>>> > it
> >>>>>> > > > >> was
> >>>>>> > > > >>>> a
> >>>>>> > > > >>>>>>> DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If
> >>>>>> that's
> >>>>>> > not
> >>>>>> > > > >>>> enough
> >>>>>> > > > >>>>>>> we can enrich the TableResult with more explicit kind
> >>>>>> of query,
> >>>>>> > > but
> >>>>>> > > > >> so
> >>>>>> > > > >>>>>>> far I don't see such a need.
> >>>>>> > > > >>>>>>>
> >>>>>> > > > >>>>>>> @Kurt In those cases I would assume the developers
> want
> >>>>>> to
> >>>>>> > > present
> >>>>>> > > > >>>>>>> results of the queries anyway. Moreover I think it is
> >>>>>> safe to
> >>>>>> > > > assume
> >>>>>> > > > >>>>>>> they can adhere to such a contract that the results
> >>>>>> must be
> >>>>>> > > > >> iterated.
> >>>>>> > > > >>>>>>>
> >>>>>> > > > >>>>>>> For direct users of TableEnvironment/Table API this
> >>>>>> method does
> >>>>>> > > not
> >>>>>> > > > >>>> make
> >>>>>> > > > >>>>>>> much sense anyway, in my opinion. I think we can
> rather
> >>>>>> safely
> >>>>>> > > > >> assume
> >>>>>> > > > >>>> in
> >>>>>> > > > >>>>>>> this scenario they do not want to submit multiple
> >>>>>> queries at a
> >>>>>> > > > >> single
> >>>>>> > > > >>>>>> time.
> >>>>>> > > > >>>>>>>
> >>>>>> > > > >>>>>>> Best,
> >>>>>> > > > >>>>>>>
> >>>>>> > > > >>>>>>> Dawid
> >>>>>> > > > >>>>>>>
> >>>>>> > > > >>>>>>>
> >>>>>> > > > >>>>>>> On 01/04/2020 15:07, Kurt Young wrote:
> >>>>>> > > > >>>>>>>> One comment to `executeMultilineSql`, I'm afraid
> >>>>>> sometimes
> >>>>>> > user
> >>>>>> > > > >> might
> >>>>>> > > > >>>>>>>> forget to
> >>>>>> > > > >>>>>>>> iterate the returned iterators, e.g. user submits a
> >>>>>> bunch of
> >>>>>> > > DDLs
> >>>>>> > > > >> and
> >>>>>> > > > >>>>>>>> expect the
> >>>>>> > > > >>>>>>>> framework will execute them one by one. But it
> didn't.
> >>>>>> > > > >>>>>>>>
> >>>>>> > > > >>>>>>>> Best,
> >>>>>> > > > >>>>>>>> Kurt
> >>>>>> > > > >>>>>>>>
> >>>>>> > > > >>>>>>>>
> >>>>>> > > > >>>>>>>> On Wed, Apr 1, 2020 at 5:10 PM Aljoscha Krettek<
> >>>>>> > > > >> [email protected]>
> >>>>>> > > > >>>>>> wrote:
> >>>>>> > > > >>>>>>>>
> >>>>>> > > > >>>>>>>>> Agreed to what Dawid and Timo said.
> >>>>>> > > > >>>>>>>>>
> >>>>>> > > > >>>>>>>>> To answer your question about multi line SQL: no, we
> >>>>>> don't
> >>>>>> > > think
> >>>>>> > > > >> we
> >>>>>> > > > >>>>>> need
> >>>>>> > > > >>>>>>>>> this in Flink 1.11, we only wanted to make sure that
> >>>>>> the
> >>>>>> > > > >> interfaces
> >>>>>> > > > >>>>>> that
> >>>>>> > > > >>>>>>>>> we now put in place will potentially allow this in
> the
> >>>>>> > future.
> >>>>>> > > > >>>>>>>>>
> >>>>>> > > > >>>>>>>>> Best,
> >>>>>> > > > >>>>>>>>> Aljoscha
> >>>>>> > > > >>>>>>>>>
> >>>>>> > > > >>>>>>>>> On 01.04.20 09:31, godfrey he wrote:
> >>>>>> > > > >>>>>>>>>> Hi, Timo & Dawid,
> >>>>>> > > > >>>>>>>>>>
> >>>>>> > > > >>>>>>>>>> Thanks so much for the effort of `multiline
> >>>>>> statements
> >>>>>> > > > >> supporting`,
> >>>>>> > > > >>>>>>>>>> I have a few questions about this method:
> >>>>>> > > > >>>>>>>>>>
> >>>>>> > > > >>>>>>>>>> 1. users can well control the execution logic
> >>>>>> through the
> >>>>>> > > > >> proposed
> >>>>>> > > > >>>>>> method
> >>>>>> > > > >>>>>>>>>>      if they know what the statements are (a
> >>>>>> statement is a
> >>>>>> > > > DDL, a
> >>>>>> > > > >>>> DML
> >>>>>> > > > >>>>>> or
> >>>>>> > > > >>>>>>>>>> others).
> >>>>>> > > > >>>>>>>>>> but if a statement is from a file, that means users
> >>>>>> do not
> >>>>>> > > know
> >>>>>> > > > >>>> what
> >>>>>> > > > >>>>>> the
> >>>>>> > > > >>>>>>>>>> statements are,
> >>>>>> > > > >>>>>>>>>> the execution behavior is unclear.
> >>>>>> > > > >>>>>>>>>> As a platform user, I think this method is hard to
> >>>>>> use,
> >>>>>> > unless
> >>>>>> > > > >> the
> >>>>>> > > > >>>>>>>>> platform
> >>>>>> > > > >>>>>>>>>> defines
> >>>>>> > > > >>>>>>>>>> a set of rule about the statements order, such as:
> >>>>>> no select
> >>>>>> > > in
> >>>>>> > > > >> the
> >>>>>> > > > >>>>>>>>> middle,
> >>>>>> > > > >>>>>>>>>> dml must be at tail of sql file (which may be the
> >>>>>> most case
> >>>>>> > in
> >>>>>> > > > >>>> product
> >>>>>> > > > >>>>>>>>>> env).
> >>>>>> > > > >>>>>>>>>> Otherwise the platform must parse the sql first,
> >>>>>> then know
> >>>>>> > > what
> >>>>>> > > > >> the
> >>>>>> > > > >>>>>>>>>> statements are.
> >>>>>> > > > >>>>>>>>>> If do like that, the platform can handle all cases
> >>>>>> through
> >>>>>> > > > >>>>>> `executeSql`
> >>>>>> > > > >>>>>>>>> and
> >>>>>> > > > >>>>>>>>>> `StatementSet`.
> >>>>>> > > > >>>>>>>>>>
> >>>>>> > > > >>>>>>>>>> 2. SQL client can't also use `executeMultilineSql`
> to
> >>>>>> > supports
> >>>>>> > > > >>>>>> multiline
> >>>>>> > > > >>>>>>>>>> statements,
> >>>>>> > > > >>>>>>>>>>      because there are some special commands
> >>>>>> introduced in
> >>>>>> > SQL
> >>>>>> > > > >>>> client,
> >>>>>> > > > >>>>>>>>>> such as `quit`, `source`, `load jar` (not exist
> now,
> >>>>>> but
> >>>>>> > maybe
> >>>>>> > > > we
> >>>>>> > > > >>>> need
> >>>>>> > > > >>>>>>>>> this
> >>>>>> > > > >>>>>>>>>> command
> >>>>>> > > > >>>>>>>>>>      to support dynamic table source and udf).
> >>>>>> > > > >>>>>>>>>> Does TableEnvironment also supports those commands?
> >>>>>> > > > >>>>>>>>>>
> >>>>>> > > > >>>>>>>>>> 3. btw, we must have this feature in release-1.11?
> I
> >>>>>> find
> >>>>>> > > there
> >>>>>> > > > >> are
> >>>>>> > > > >>>>>> few
> >>>>>> > > > >>>>>>>>>> user cases
> >>>>>> > > > >>>>>>>>>>      in the feedback document which behavior is
> >>>>>> unclear now.
> >>>>>> > > > >>>>>>>>>>
> >>>>>> > > > >>>>>>>>>> regarding to "change the return value from
> >>>>>> `Iterable<Row` to
> >>>>>> > > > >>>>>>>>>> `Iterator<Row`",
> >>>>>> > > > >>>>>>>>>> I couldn't agree more with this change. Just as
> Dawid
> >>>>>> > > mentioned
> >>>>>> > > > >>>>>>>>>> "The contract of the Iterable#iterator is that it
> >>>>>> returns a
> >>>>>> > > new
> >>>>>> > > > >>>>>> iterator
> >>>>>> > > > >>>>>>>>>> each time,
> >>>>>> > > > >>>>>>>>>>      which effectively means we can iterate the
> >>>>>> results
> >>>>>> > > multiple
> >>>>>> > > > >>>>>> times.",
> >>>>>> > > > >>>>>>>>>> we does not provide iterate the results multiple
> >>>>>> times.
> >>>>>> > > > >>>>>>>>>> If we want do that, the client must buffer all
> >>>>>> results. but
> >>>>>> > > it's
> >>>>>> > > > >>>>>>>>> impossible
> >>>>>> > > > >>>>>>>>>> for streaming job.
> >>>>>> > > > >>>>>>>>>>
> >>>>>> > > > >>>>>>>>>> Best,
> >>>>>> > > > >>>>>>>>>> Godfrey
> >>>>>> > > > >>>>>>>>>>
> >>>>>> > > > >>>>>>>>>> Dawid Wysakowicz<[email protected]>
> >>>>>> 于2020年4月1日周三
> >>>>>> > > > 上午3:14写道：
> >>>>>> > > > >>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>> Thank you Timo for the great summary! It covers
> >>>>>> (almost)
> >>>>>> > all
> >>>>>> > > > the
> >>>>>> > > > >>>>>> topics.
> >>>>>> > > > >>>>>>>>>>> Even though in the end we are not suggesting much
> >>>>>> changes
> >>>>>> > to
> >>>>>> > > > the
> >>>>>> > > > >>>>>> current
> >>>>>> > > > >>>>>>>>>>> state of FLIP I think it is important to lay out
> all
> >>>>>> > possible
> >>>>>> > > > >> use
> >>>>>> > > > >>>>>> cases
> >>>>>> > > > >>>>>>>>>>> so that we do not change the execution model every
> >>>>>> release.
> >>>>>> > > > >>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>> There is one additional thing we discussed. Could
> >>>>>> we change
> >>>>>> > > the
> >>>>>> > > > >>>>>> result
> >>>>>> > > > >>>>>>>>>>> type of TableResult#collect to Iterator<Row>? Even
> >>>>>> though
> >>>>>> > > those
> >>>>>> > > > >>>>>>>>>>> interfaces do not differ much. I think Iterator
> >>>>>> better
> >>>>>> > > > describes
> >>>>>> > > > >>>> that
> >>>>>> > > > >>>>>>>>>>> the results might not be materialized on the
> client
> >>>>>> side,
> >>>>>> > but
> >>>>>> > > > >> can
> >>>>>> > > > >>>> be
> >>>>>> > > > >>>>>>>>>>> retrieved on a per record basis. The contract of
> the
> >>>>>> > > > >>>>>> Iterable#iterator
> >>>>>> > > > >>>>>>>>>>> is that it returns a new iterator each time, which
> >>>>>> > > effectively
> >>>>>> > > > >>>> means
> >>>>>> > > > >>>>>> we
> >>>>>> > > > >>>>>>>>>>> can iterate the results multiple times. Iterating
> >>>>>> the
> >>>>>> > results
> >>>>>> > > > is
> >>>>>> > > > >>>> not
> >>>>>> > > > >>>>>>>>>>> possible when we don't retrieve all the results
> >>>>>> from the
> >>>>>> > > > cluster
> >>>>>> > > > >>>> at
> >>>>>> > > > >>>>>>>>> once.
> >>>>>> > > > >>>>>>>>>>> I think we should also use Iterator for
> >>>>>> > > > >>>>>>>>>>> TableEnvironment#executeMultilineSql(String
> >>>>>> statements):
> >>>>>> > > > >>>>>>>>>>> Iterator<TableResult>.
> >>>>>> > > > >>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>> Best,
> >>>>>> > > > >>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>> Dawid
> >>>>>> > > > >>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>> On 31/03/2020 19:27, Timo Walther wrote:
> >>>>>> > > > >>>>>>>>>>>> Hi Godfrey,
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> Aljoscha, Dawid, Klou, and I had another
> >>>>>> discussion around
> >>>>>> > > > >>>> FLIP-84.
> >>>>>> > > > >>>>>> In
> >>>>>> > > > >>>>>>>>>>>> particular, we discussed how the current status
> of
> >>>>>> the
> >>>>>> > FLIP
> >>>>>> > > > and
> >>>>>> > > > >>>> the
> >>>>>> > > > >>>>>>>>>>>> future requirements around multiline statements,
> >>>>>> > async/sync,
> >>>>>> > > > >>>>>> collect()
> >>>>>> > > > >>>>>>>>>>>> fit together.
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> We also updated the FLIP-84 Feedback Summary
> >>>>>> document [1]
> >>>>>> > > with
> >>>>>> > > > >>>> some
> >>>>>> > > > >>>>>>>>>>>> use cases.
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> We believe that we found a good solution that
> also
> >>>>>> fits to
> >>>>>> > > > what
> >>>>>> > > > >>>> is
> >>>>>> > > > >>>>>> in
> >>>>>> > > > >>>>>>>>>>>> the current FLIP. So no bigger changes necessary,
> >>>>>> which is
> >>>>>> > > > >> great!
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> Our findings were:
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> 1. Async vs sync submission of Flink jobs:
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> Having a blocking `execute()` in DataStream API
> was
> >>>>>> > rather a
> >>>>>> > > > >>>>>> mistake.
> >>>>>> > > > >>>>>>>>>>>> Instead all submissions should be async because
> >>>>>> this
> >>>>>> > allows
> >>>>>> > > > >>>>>> supporting
> >>>>>> > > > >>>>>>>>>>>> both modes if necessary. Thus, submitting all
> >>>>>> queries
> >>>>>> > async
> >>>>>> > > > >>>> sounds
> >>>>>> > > > >>>>>>>>>>>> good to us. If users want to run a job sync, they
> >>>>>> can use
> >>>>>> > > the
> >>>>>> > > > >>>>>>>>>>>> JobClient and wait for completion (or collect()
> in
> >>>>>> case of
> >>>>>> > > > >> batch
> >>>>>> > > > >>>>>> jobs).
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> 2. Multi-statement execution:
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> For the multi-statement execution, we don't see a
> >>>>>> > > > >> contradication
> >>>>>> > > > >>>>>> with
> >>>>>> > > > >>>>>>>>>>>> the async execution behavior. We imagine a method
> >>>>>> like:
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> TableEnvironment#executeMultilineSql(String
> >>>>>> statements):
> >>>>>> > > > >>>>>>>>>>>> Iterable<TableResult>
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> Where the `Iterator#next()` method would trigger
> >>>>>> the next
> >>>>>> > > > >>>> statement
> >>>>>> > > > >>>>>>>>>>>> submission. This allows a caller to decide
> >>>>>> synchronously
> >>>>>> > > when
> >>>>>> > > > >> to
> >>>>>> > > > >>>>>>>>>>>> submit statements async to the cluster. Thus, a
> >>>>>> service
> >>>>>> > such
> >>>>>> > > > as
> >>>>>> > > > >>>> the
> >>>>>> > > > >>>>>>>>>>>> SQL Client can handle the result of each
> statement
> >>>>>> > > > individually
> >>>>>> > > > >>>> and
> >>>>>> > > > >>>>>>>>>>>> process statement by statement sequentially.
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> 3. The role of TableResult and result retrieval
> in
> >>>>>> general
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> `TableResult` is similar to `JobClient`. Instead
> of
> >>>>>> > > returning
> >>>>>> > > > a
> >>>>>> > > > >>>>>>>>>>>> `CompletableFuture` of something, it is a
> concrete
> >>>>>> util
> >>>>>> > > class
> >>>>>> > > > >>>> where
> >>>>>> > > > >>>>>>>>>>>> some methods have the behavior of completable
> >>>>>> future (e.g.
> >>>>>> > > > >>>>>> collect(),
> >>>>>> > > > >>>>>>>>>>>> print()) and some are already completed
> >>>>>> (getTableSchema(),
> >>>>>> > > > >>>>>>>>>>>> getResultKind()).
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> `StatementSet#execute()` returns a single
> >>>>>> `TableResult`
> >>>>>> > > > because
> >>>>>> > > > >>>> the
> >>>>>> > > > >>>>>>>>>>>> order is undefined in a set and all statements
> >>>>>> have the
> >>>>>> > same
> >>>>>> > > > >>>> schema.
> >>>>>> > > > >>>>>>>>>>>> Its `collect()` will return a row for each
> executed
> >>>>>> > `INSERT
> >>>>>> > > > >>>> INTO` in
> >>>>>> > > > >>>>>>>>>>>> the order of statement definition.
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> For simple `SELECT * FROM ...`, the query
> >>>>>> execution might
> >>>>>> > > > block
> >>>>>> > > > >>>>>> until
> >>>>>> > > > >>>>>>>>>>>> `collect()` is called to pull buffered rows from
> >>>>>> the job
> >>>>>> > > (from
> >>>>>> > > > >>>>>>>>>>>> socket/REST API what ever we will use in the
> >>>>>> future). We
> >>>>>> > can
> >>>>>> > > > >> say
> >>>>>> > > > >>>>>> that
> >>>>>> > > > >>>>>>>>>>>> a statement finished successfully, when the
> >>>>>> > > > >>>>>> `collect#Iterator#hasNext`
> >>>>>> > > > >>>>>>>>>>>> has returned false.
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> I hope this summarizes our discussion
> >>>>>> > @Dawid/Aljoscha/Klou?
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> It would be great if we can add these findings to
> >>>>>> the FLIP
> >>>>>> > > > >>>> before we
> >>>>>> > > > >>>>>>>>>>>> start voting.
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> One minor thing: some `execute()` methods still
> >>>>>> throw a
> >>>>>> > > > checked
> >>>>>> > > > >>>>>>>>>>>> exception; can we remove that from the FLIP? Also
> >>>>>> the
> >>>>>> > above
> >>>>>> > > > >>>>>> mentioned
> >>>>>> > > > >>>>>>>>>>>> `Iterator#next()` would trigger an execution
> >>>>>> without
> >>>>>> > > throwing
> >>>>>> > > > a
> >>>>>> > > > >>>>>>>>>>>> checked exception.
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> Thanks,
> >>>>>> > > > >>>>>>>>>>>> Timo
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>> [1]
> >>>>>> > > > >>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>
> >>>>>> > > > >>>>>>
> >>>>>> > > > >>>>
> >>>>>> > > > >>
> >>>>>> > > >
> >>>>>> > >
> >>>>>> >
> >>>>>>
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#
> >>>>>> > > > >>>>>>>>>>>> On 31.03.20 06:28, godfrey he wrote:
> >>>>>> > > > >>>>>>>>>>>>> Hi, Timo & Jark
> >>>>>> > > > >>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>> Thanks for your explanation.
> >>>>>> > > > >>>>>>>>>>>>> Agree with you that async execution should
> always
> >>>>>> be
> >>>>>> > async,
> >>>>>> > > > >>>>>>>>>>>>> and sync execution scenario can be covered  by
> >>>>>> async
> >>>>>> > > > >> execution.
> >>>>>> > > > >>>>>>>>>>>>> It helps provide an unified entry point for
> batch
> >>>>>> and
> >>>>>> > > > >> streaming.
> >>>>>> > > > >>>>>>>>>>>>> I think we can also use sync execution for some
> >>>>>> testing.
> >>>>>> > > > >>>>>>>>>>>>> So, I agree with you that we provide
> `executeSql`
> >>>>>> method
> >>>>>> > > and
> >>>>>> > > > >>>> it's
> >>>>>> > > > >>>>>>>>> async
> >>>>>> > > > >>>>>>>>>>>>> method.
> >>>>>> > > > >>>>>>>>>>>>> If we want sync method in the future, we can add
> >>>>>> method
> >>>>>> > > named
> >>>>>> > > > >>>>>>>>>>>>> `executeSqlSync`.
> >>>>>> > > > >>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>> I think we've reached an agreement. I will
> update
> >>>>>> the
> >>>>>> > > > >> document,
> >>>>>> > > > >>>> and
> >>>>>> > > > >>>>>>>>>>>>> start
> >>>>>> > > > >>>>>>>>>>>>> voting process.
> >>>>>> > > > >>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>> Best,
> >>>>>> > > > >>>>>>>>>>>>> Godfrey
> >>>>>> > > > >>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>> Jark Wu<[email protected]>  于2020年3月31日周二
> >>>>>> 上午12:46写道：
> >>>>>> > > > >>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>> Hi,
> >>>>>> > > > >>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>> I didn't follow the full discussion.
> >>>>>> > > > >>>>>>>>>>>>>> But I share the same concern with Timo that
> >>>>>> streaming
> >>>>>> > > > queries
> >>>>>> > > > >>>>>> should
> >>>>>> > > > >>>>>>>>>>>>>> always
> >>>>>> > > > >>>>>>>>>>>>>> be async.
> >>>>>> > > > >>>>>>>>>>>>>> Otherwise, I can image it will cause a lot of
> >>>>>> confusion
> >>>>>> > > and
> >>>>>> > > > >>>>>> problems
> >>>>>> > > > >>>>>>>>> if
> >>>>>> > > > >>>>>>>>>>>>>> users don't deeply keep the "sync" in mind
> (e.g.
> >>>>>> client
> >>>>>> > > > >> hangs).
> >>>>>> > > > >>>>>>>>>>>>>> Besides, the streaming mode is still the
> >>>>>> majority use
> >>>>>> > > cases
> >>>>>> > > > >> of
> >>>>>> > > > >>>>>> Flink
> >>>>>> > > > >>>>>>>>>>>>>> and
> >>>>>> > > > >>>>>>>>>>>>>> Flink SQL. We should put the usability at a
> high
> >>>>>> > priority.
> >>>>>> > > > >>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>> Best,
> >>>>>> > > > >>>>>>>>>>>>>> Jark
> >>>>>> > > > >>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>> On Mon, 30 Mar 2020 at 23:27, Timo Walther<
> >>>>>> > > > >> [email protected]>
> >>>>>> > > > >>>>>>>>> wrote:
> >>>>>> > > > >>>>>>>>>>>>>>> Hi Godfrey,
> >>>>>> > > > >>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>> maybe I wasn't expressing my biggest concern
> >>>>>> enough in
> >>>>>> > my
> >>>>>> > > > >> last
> >>>>>> > > > >>>>>> mail.
> >>>>>> > > > >>>>>>>>>>>>>>> Even in a singleline and sync execution, I
> >>>>>> think that
> >>>>>> > > > >>>> streaming
> >>>>>> > > > >>>>>>>>>>>>>>> queries
> >>>>>> > > > >>>>>>>>>>>>>>> should not block the execution. Otherwise it
> is
> >>>>>> not
> >>>>>> > > > possible
> >>>>>> > > > >>>> to
> >>>>>> > > > >>>>>> call
> >>>>>> > > > >>>>>>>>>>>>>>> collect() or print() on them afterwards.
> >>>>>> > > > >>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>> "there are too many things need to discuss for
> >>>>>> > > multiline":
> >>>>>> > > > >>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>> True, I don't want to solve all of them right
> >>>>>> now. But
> >>>>>> > > what
> >>>>>> > > > >> I
> >>>>>> > > > >>>>>> know
> >>>>>> > > > >>>>>>>>> is
> >>>>>> > > > >>>>>>>>>>>>>>> that our newly introduced methods should fit
> >>>>>> into a
> >>>>>> > > > >> multiline
> >>>>>> > > > >>>>>>>>>>>>>>> execution.
> >>>>>> > > > >>>>>>>>>>>>>>> There is no big difference of calling
> >>>>>> `executeSql(A),
> >>>>>> > > > >>>>>>>>>>>>>>> executeSql(B)` and
> >>>>>> > > > >>>>>>>>>>>>>>> processing a multiline file `A;\nB;`.
> >>>>>> > > > >>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>> I think the example that you mentioned can
> >>>>>> simply be
> >>>>>> > > > >> undefined
> >>>>>> > > > >>>>>> for
> >>>>>> > > > >>>>>>>>>>>>>>> now.
> >>>>>> > > > >>>>>>>>>>>>>>> Currently, no catalog is modifying data but
> just
> >>>>>> > > metadata.
> >>>>>> > > > >>>> This
> >>>>>> > > > >>>>>> is a
> >>>>>> > > > >>>>>>>>>>>>>>> separate discussion.
> >>>>>> > > > >>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>> "result of the second statement is
> >>>>>> indeterministic":
> >>>>>> > > > >>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>> Sure this is indeterministic. But this is the
> >>>>>> > > implementers
> >>>>>> > > > >>>> fault
> >>>>>> > > > >>>>>>>>>>>>>>> and we
> >>>>>> > > > >>>>>>>>>>>>>>> cannot forbid such pipelines.
> >>>>>> > > > >>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>> How about we always execute streaming queries
> >>>>>> async? It
> >>>>>> > > > >> would
> >>>>>> > > > >>>>>>>>> unblock
> >>>>>> > > > >>>>>>>>>>>>>>> executeSql() and multiline statements.
> >>>>>> > > > >>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>> Having a `executeSqlAsync()` is useful for
> >>>>>> batch.
> >>>>>> > > However,
> >>>>>> > > > I
> >>>>>> > > > >>>>>> don't
> >>>>>> > > > >>>>>>>>>>>>>>> want
> >>>>>> > > > >>>>>>>>>>>>>>> `sync/async` be the new batch/stream flag. The
> >>>>>> > execution
> >>>>>> > > > >>>> behavior
> >>>>>> > > > >>>>>>>>>>>>>>> should
> >>>>>> > > > >>>>>>>>>>>>>>> come from the query itself.
> >>>>>> > > > >>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>> Regards,
> >>>>>> > > > >>>>>>>>>>>>>>> Timo
> >>>>>> > > > >>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>> On 30.03.20 11:12, godfrey he wrote:
> >>>>>> > > > >>>>>>>>>>>>>>>> Hi Timo,
> >>>>>> > > > >>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>> Agree with you that streaming queries is our
> >>>>>> top
> >>>>>> > > priority,
> >>>>>> > > > >>>>>>>>>>>>>>>> but I think there are too many things need to
> >>>>>> discuss
> >>>>>> > > for
> >>>>>> > > > >>>>>> multiline
> >>>>>> > > > >>>>>>>>>>>>>>>> statements:
> >>>>>> > > > >>>>>>>>>>>>>>>> e.g.
> >>>>>> > > > >>>>>>>>>>>>>>>> 1. what's the behaivor of DDL and DML mixing
> >>>>>> for async
> >>>>>> > > > >>>>>> execution:
> >>>>>> > > > >>>>>>>>>>>>>>>> create table t1 xxx;
> >>>>>> > > > >>>>>>>>>>>>>>>> create table t2 xxx;
> >>>>>> > > > >>>>>>>>>>>>>>>> insert into t2 select * from t1 where xxx;
> >>>>>> > > > >>>>>>>>>>>>>>>> drop table t1; // t1 may be a MySQL table,
> the
> >>>>>> data
> >>>>>> > will
> >>>>>> > > > >>>> also be
> >>>>>> > > > >>>>>>>>>>>>>> deleted.
> >>>>>> > > > >>>>>>>>>>>>>>>> t1 is dropped when "insert" job is running.
> >>>>>> > > > >>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>> 2. what's the behaivor of unified scenario
> for
> >>>>>> async
> >>>>>> > > > >>>> execution:
> >>>>>> > > > >>>>>>>>>>>>>>>> (as you
> >>>>>> > > > >>>>>>>>>>>>>>>> mentioned)
> >>>>>> > > > >>>>>>>>>>>>>>>> INSERT INTO t1 SELECT * FROM s;
> >>>>>> > > > >>>>>>>>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT
> >>>>>> STREAM;
> >>>>>> > > > >>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>> The result of the second statement is
> >>>>>> indeterministic,
> >>>>>> > > > >>>> because
> >>>>>> > > > >>>>>> the
> >>>>>> > > > >>>>>>>>>>>>>> first
> >>>>>> > > > >>>>>>>>>>>>>>>> statement maybe is running.
> >>>>>> > > > >>>>>>>>>>>>>>>> I think we need to put a lot of effort to
> >>>>>> define the
> >>>>>> > > > >>>> behavior of
> >>>>>> > > > >>>>>>>>>>>>>>> logically
> >>>>>> > > > >>>>>>>>>>>>>>>> related queries.
> >>>>>> > > > >>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>> In this FLIP, I suggest we only handle single
> >>>>>> > statement,
> >>>>>> > > > >> and
> >>>>>> > > > >>>> we
> >>>>>> > > > >>>>>>>>> also
> >>>>>> > > > >>>>>>>>>>>>>>>> introduce an async execute method
> >>>>>> > > > >>>>>>>>>>>>>>>> which is more important and more often used
> >>>>>> for users.
> >>>>>> > > > >>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>> Dor the sync methods (like
> >>>>>> > `TableEnvironment.executeSql`
> >>>>>> > > > >> and
> >>>>>> > > > >>>>>>>>>>>>>>>> `StatementSet.execute`),
> >>>>>> > > > >>>>>>>>>>>>>>>> the result will be returned until the job is
> >>>>>> finished.
> >>>>>> > > The
> >>>>>> > > > >>>>>>>>> following
> >>>>>> > > > >>>>>>>>>>>>>>>> methods will be introduced in this FLIP:
> >>>>>> > > > >>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>        /**
> >>>>>> > > > >>>>>>>>>>>>>>>>         * Asynchronously execute the given
> >>>>>> single
> >>>>>> > > > statement
> >>>>>> > > > >>>>>>>>>>>>>>>>         */
> >>>>>> > > > >>>>>>>>>>>>>>>> TableEnvironment.executeSqlAsync(String
> >>>>>> statement):
> >>>>>> > > > >>>> TableResult
> >>>>>> > > > >>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>> /**
> >>>>>> > > > >>>>>>>>>>>>>>>>        * Asynchronously execute the dml
> >>>>>> statements as
> >>>>>> > a
> >>>>>> > > > >> batch
> >>>>>> > > > >>>>>>>>>>>>>>>>        */
> >>>>>> > > > >>>>>>>>>>>>>>>> StatementSet.executeAsync(): TableResult
> >>>>>> > > > >>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>> public interface TableResult {
> >>>>>> > > > >>>>>>>>>>>>>>>>          /**
> >>>>>> > > > >>>>>>>>>>>>>>>>           * return JobClient for DQL and DML
> >>>>>> in async
> >>>>>> > > > mode,
> >>>>>> > > > >>>> else
> >>>>>> > > > >>>>>>>>> return
> >>>>>> > > > >>>>>>>>>>>>>>>> Optional.empty
> >>>>>> > > > >>>>>>>>>>>>>>>>           */
> >>>>>> > > > >>>>>>>>>>>>>>>>          Optional<JobClient> getJobClient();
> >>>>>> > > > >>>>>>>>>>>>>>>> }
> >>>>>> > > > >>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>> what do you think?
> >>>>>> > > > >>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>> Best,
> >>>>>> > > > >>>>>>>>>>>>>>>> Godfrey
> >>>>>> > > > >>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>> Timo Walther<[email protected]>
> >>>>>> 于2020年3月26日周四
> >>>>>> > > 下午9:15写道：
> >>>>>> > > > >>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> Hi Godfrey,
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> executing streaming queries must be our top
> >>>>>> priority
> >>>>>> > > > >> because
> >>>>>> > > > >>>>>> this
> >>>>>> > > > >>>>>>>>> is
> >>>>>> > > > >>>>>>>>>>>>>>>>> what distinguishes Flink from competitors.
> If
> >>>>>> we
> >>>>>> > change
> >>>>>> > > > >> the
> >>>>>> > > > >>>>>>>>>>>>>>>>> execution
> >>>>>> > > > >>>>>>>>>>>>>>>>> behavior, we should think about the other
> >>>>>> cases as
> >>>>>> > well
> >>>>>> > > > to
> >>>>>> > > > >>>> not
> >>>>>> > > > >>>>>>>>> break
> >>>>>> > > > >>>>>>>>>>>>>> the
> >>>>>> > > > >>>>>>>>>>>>>>>>> API a third time.
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> I fear that just having an async execute
> >>>>>> method will
> >>>>>> > > not
> >>>>>> > > > >> be
> >>>>>> > > > >>>>>> enough
> >>>>>> > > > >>>>>>>>>>>>>>>>> because users should be able to mix
> streaming
> >>>>>> and
> >>>>>> > batch
> >>>>>> > > > >>>> queries
> >>>>>> > > > >>>>>>>>> in a
> >>>>>> > > > >>>>>>>>>>>>>>>>> unified scenario.
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> If I remember it correctly, we had some
> >>>>>> discussions
> >>>>>> > in
> >>>>>> > > > the
> >>>>>> > > > >>>> past
> >>>>>> > > > >>>>>>>>>>>>>>>>> about
> >>>>>> > > > >>>>>>>>>>>>>>>>> what decides about the execution mode of a
> >>>>>> query.
> >>>>>> > > > >>>> Currently, we
> >>>>>> > > > >>>>>>>>>>>>>>>>> would
> >>>>>> > > > >>>>>>>>>>>>>>>>> like to let the query decide, not derive it
> >>>>>> from the
> >>>>>> > > > >>>> sources.
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> So I could image a multiline pipeline as:
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> USE CATALOG 'mycat';
> >>>>>> > > > >>>>>>>>>>>>>>>>> INSERT INTO t1 SELECT * FROM s;
> >>>>>> > > > >>>>>>>>>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT
> >>>>>> STREAM;
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> For executeMultilineSql():
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> sync because regular SQL
> >>>>>> > > > >>>>>>>>>>>>>>>>> sync because regular Batch SQL
> >>>>>> > > > >>>>>>>>>>>>>>>>> async because Streaming SQL
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> For executeAsyncMultilineSql():
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> async because everything should be async
> >>>>>> > > > >>>>>>>>>>>>>>>>> async because everything should be async
> >>>>>> > > > >>>>>>>>>>>>>>>>> async because everything should be async
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> What we should not start for
> >>>>>> > > executeAsyncMultilineSql():
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> sync because DDL
> >>>>>> > > > >>>>>>>>>>>>>>>>> async because everything should be async
> >>>>>> > > > >>>>>>>>>>>>>>>>> async because everything should be async
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> What are you thoughts here?
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> Regards,
> >>>>>> > > > >>>>>>>>>>>>>>>>> Timo
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>> On 26.03.20 12:50, godfrey he wrote:
> >>>>>> > > > >>>>>>>>>>>>>>>>>> Hi Timo,
> >>>>>> > > > >>>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>>> I agree with you that streaming queries
> >>>>>> mostly need
> >>>>>> > > > async
> >>>>>> > > > >>>>>>>>>>>>>>>>>> execution.
> >>>>>> > > > >>>>>>>>>>>>>>>>>> In fact, our original plan is only
> >>>>>> introducing sync
> >>>>>> > > > >>>> methods in
> >>>>>> > > > >>>>>>>>> this
> >>>>>> > > > >>>>>>>>>>>>>>> FLIP,
> >>>>>> > > > >>>>>>>>>>>>>>>>>> and async methods (like "executeSqlAsync")
> >>>>>> will be
> >>>>>> > > > >>>> introduced
> >>>>>> > > > >>>>>> in
> >>>>>> > > > >>>>>>>>>>>>>>>>>> the
> >>>>>> > > > >>>>>>>>>>>>>>>>> future
> >>>>>> > > > >>>>>>>>>>>>>>>>>> which is mentioned in the appendix.
> >>>>>> > > > >>>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>>> Maybe the async methods also need to be
> >>>>>> considered
> >>>>>> > in
> >>>>>> > > > >> this
> >>>>>> > > > >>>>>> FLIP.
> >>>>>> > > > >>>>>>>>>>>>>>>>>>
> >>>>>> > > > >>>>>>>>>>>>>>>>>> I think sync methods is also useful for
> >>>>>> streaming
> >>>>>> > > which
> >>>>>>
> >>>>>
>

Re: [DISCUSS] FLIP-84 Feedback Summary

Reply via email to