Re: [DISCUSS] FLIP-84 Feedback Summary

godfrey he Wed, 01 Apr 2020 20:28:29 -0700

Hi Aljoscha, Dawid, Timo,

Thanks so much for the detailed explanation.
Agree with you that the multiline story is not completed now, and we can
keep discussion.
I will add current discussions and conclusions to the FLIP.


Best,
Godfrey



Timo Walther <twal...@apache.org> 于2020年4月1日周三 下午11:27写道：

> Hi Godfrey,
>
> first of all, I agree with Dawid. The multiline story is not completed
> by this FLIP. It just verifies the big picture.
>
> 1. "control the execution logic through the proposed method if they know
> what the statements are"
>
> This is a good point that also Fabian raised in the linked google doc. I
> could also imagine to return a more complicated POJO when calling
> `executeMultiSql()`.
>
> The POJO would include some `getSqlProperties()` such that a platform
> gets insights into the query before executing. We could also trigger the
> execution more explicitly instead of hiding it behind an iterator.
>
> 2. "there are some special commands introduced in SQL client"
>
> For platforms and SQL Client specific commands, we could offer a hook to
> the parser or a fallback parser in case the regular table environment
> parser cannot deal with the statement.
>
> However, all of that is future work and can be discussed in a separate
> FLIP.
>
> 3. +1 for the `Iterator` instead of `Iterable`.
>
> 4. "we should convert the checked exception to unchecked exception"
>
> Yes, I meant using a runtime exception instead of a checked exception.
> There was no consensus on putting the exception into the `TableResult`.
>
> Regards,
> Timo
>
> On 01.04.20 15:35, Dawid Wysakowicz wrote:
> > When considering the multi-line support I think it is helpful to start
> > with a use case in mind. In my opinion consumers of this method will be:
> >
> >  1. sql-client
> >  2. third-part sql based platforms
> >
> > @Godfrey As for the quit/source/... commands. I think those belong to
> > the responsibility of aforementioned. I think they should not be
> > understandable by the TableEnvironment. What would quit on a
> > TableEnvironment do? Moreover I think such commands should be prefixed
> > appropriately. I think it's a common practice to e.g. prefix those with
> > ! or : to say they are meta commands of the tool rather than a query.
> >
> > I also don't necessarily understand why platform users need to know the
> > kind of the query to use the proposed method. They should get the type
> > from the TableResult#ResultKind. If the ResultKind is SUCCESS, it was a
> > DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If that's not enough
> > we can enrich the TableResult with more explicit kind of query, but so
> > far I don't see such a need.
> >
> > @Kurt In those cases I would assume the developers want to present
> > results of the queries anyway. Moreover I think it is safe to assume
> > they can adhere to such a contract that the results must be iterated.
> >
> > For direct users of TableEnvironment/Table API this method does not make
> > much sense anyway, in my opinion. I think we can rather safely assume in
> > this scenario they do not want to submit multiple queries at a single
> time.
> >
> > Best,
> >
> > Dawid
> >
> >
> > On 01/04/2020 15:07, Kurt Young wrote:
> >> One comment to `executeMultilineSql`, I'm afraid sometimes user might
> >> forget to
> >> iterate the returned iterators, e.g. user submits a bunch of DDLs and
> >> expect the
> >> framework will execute them one by one. But it didn't.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Apr 1, 2020 at 5:10 PM Aljoscha Krettek<aljos...@apache.org>
> wrote:
> >>
> >>> Agreed to what Dawid and Timo said.
> >>>
> >>> To answer your question about multi line SQL: no, we don't think we
> need
> >>> this in Flink 1.11, we only wanted to make sure that the interfaces
> that
> >>> we now put in place will potentially allow this in the future.
> >>>
> >>> Best,
> >>> Aljoscha
> >>>
> >>> On 01.04.20 09:31, godfrey he wrote:
> >>>> Hi, Timo & Dawid,
> >>>>
> >>>> Thanks so much for the effort of `multiline statements supporting`,
> >>>> I have a few questions about this method:
> >>>>
> >>>> 1. users can well control the execution logic through the proposed
> method
> >>>>    if they know what the statements are (a statement is a DDL, a DML
> or
> >>>> others).
> >>>> but if a statement is from a file, that means users do not know what
> the
> >>>> statements are,
> >>>> the execution behavior is unclear.
> >>>> As a platform user, I think this method is hard to use, unless the
> >>> platform
> >>>> defines
> >>>> a set of rule about the statements order, such as: no select in the
> >>> middle,
> >>>> dml must be at tail of sql file (which may be the most case in product
> >>>> env).
> >>>> Otherwise the platform must parse the sql first, then know what the
> >>>> statements are.
> >>>> If do like that, the platform can handle all cases through
> `executeSql`
> >>> and
> >>>> `StatementSet`.
> >>>>
> >>>> 2. SQL client can't also use `executeMultilineSql` to supports
> multiline
> >>>> statements,
> >>>>    because there are some special commands introduced in SQL client,
> >>>> such as `quit`, `source`, `load jar` (not exist now, but maybe we need
> >>> this
> >>>> command
> >>>>    to support dynamic table source and udf).
> >>>> Does TableEnvironment also supports those commands?
> >>>>
> >>>> 3. btw, we must have this feature in release-1.11? I find there are
> few
> >>>> user cases
> >>>>    in the feedback document which behavior is unclear now.
> >>>>
> >>>> regarding to "change the return value from `Iterable<Row` to
> >>>> `Iterator<Row`",
> >>>> I couldn't agree more with this change. Just as Dawid mentioned
> >>>> "The contract of the Iterable#iterator is that it returns a new
> iterator
> >>>> each time,
> >>>>    which effectively means we can iterate the results multiple
> times.",
> >>>> we does not provide iterate the results multiple times.
> >>>> If we want do that, the client must buffer all results. but it's
> >>> impossible
> >>>> for streaming job.
> >>>>
> >>>> Best,
> >>>> Godfrey
> >>>>
> >>>> Dawid Wysakowicz<dwysakow...@apache.org>  于2020年4月1日周三 上午3:14写道：
> >>>>
> >>>>> Thank you Timo for the great summary! It covers (almost) all the
> topics.
> >>>>> Even though in the end we are not suggesting much changes to the
> current
> >>>>> state of FLIP I think it is important to lay out all possible use
> cases
> >>>>> so that we do not change the execution model every release.
> >>>>>
> >>>>> There is one additional thing we discussed. Could we change the
> result
> >>>>> type of TableResult#collect to Iterator<Row>? Even though those
> >>>>> interfaces do not differ much. I think Iterator better describes that
> >>>>> the results might not be materialized on the client side, but can be
> >>>>> retrieved on a per record basis. The contract of the
> Iterable#iterator
> >>>>> is that it returns a new iterator each time, which effectively means
> we
> >>>>> can iterate the results multiple times. Iterating the results is not
> >>>>> possible when we don't retrieve all the results from the cluster at
> >>> once.
> >>>>> I think we should also use Iterator for
> >>>>> TableEnvironment#executeMultilineSql(String statements):
> >>>>> Iterator<TableResult>.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Dawid
> >>>>>
> >>>>> On 31/03/2020 19:27, Timo Walther wrote:
> >>>>>> Hi Godfrey,
> >>>>>>
> >>>>>> Aljoscha, Dawid, Klou, and I had another discussion around FLIP-84.
> In
> >>>>>> particular, we discussed how the current status of the FLIP and the
> >>>>>> future requirements around multiline statements, async/sync,
> collect()
> >>>>>> fit together.
> >>>>>>
> >>>>>> We also updated the FLIP-84 Feedback Summary document [1] with some
> >>>>>> use cases.
> >>>>>>
> >>>>>> We believe that we found a good solution that also fits to what is
> in
> >>>>>> the current FLIP. So no bigger changes necessary, which is great!
> >>>>>>
> >>>>>> Our findings were:
> >>>>>>
> >>>>>> 1. Async vs sync submission of Flink jobs:
> >>>>>>
> >>>>>> Having a blocking `execute()` in DataStream API was rather a
> mistake.
> >>>>>> Instead all submissions should be async because this allows
> supporting
> >>>>>> both modes if necessary. Thus, submitting all queries async sounds
> >>>>>> good to us. If users want to run a job sync, they can use the
> >>>>>> JobClient and wait for completion (or collect() in case of batch
> jobs).
> >>>>>>
> >>>>>> 2. Multi-statement execution:
> >>>>>>
> >>>>>> For the multi-statement execution, we don't see a contradication
> with
> >>>>>> the async execution behavior. We imagine a method like:
> >>>>>>
> >>>>>> TableEnvironment#executeMultilineSql(String statements):
> >>>>>> Iterable<TableResult>
> >>>>>>
> >>>>>> Where the `Iterator#next()` method would trigger the next statement
> >>>>>> submission. This allows a caller to decide synchronously when to
> >>>>>> submit statements async to the cluster. Thus, a service such as the
> >>>>>> SQL Client can handle the result of each statement individually and
> >>>>>> process statement by statement sequentially.
> >>>>>>
> >>>>>> 3. The role of TableResult and result retrieval in general
> >>>>>>
> >>>>>> `TableResult` is similar to `JobClient`. Instead of returning a
> >>>>>> `CompletableFuture` of something, it is a concrete util class where
> >>>>>> some methods have the behavior of completable future (e.g.
> collect(),
> >>>>>> print()) and some are already completed (getTableSchema(),
> >>>>>> getResultKind()).
> >>>>>>
> >>>>>> `StatementSet#execute()` returns a single `TableResult` because the
> >>>>>> order is undefined in a set and all statements have the same schema.
> >>>>>> Its `collect()` will return a row for each executed `INSERT INTO` in
> >>>>>> the order of statement definition.
> >>>>>>
> >>>>>> For simple `SELECT * FROM ...`, the query execution might block
> until
> >>>>>> `collect()` is called to pull buffered rows from the job (from
> >>>>>> socket/REST API what ever we will use in the future). We can say
> that
> >>>>>> a statement finished successfully, when the
> `collect#Iterator#hasNext`
> >>>>>> has returned false.
> >>>>>>
> >>>>>> I hope this summarizes our discussion @Dawid/Aljoscha/Klou?
> >>>>>>
> >>>>>> It would be great if we can add these findings to the FLIP before we
> >>>>>> start voting.
> >>>>>>
> >>>>>> One minor thing: some `execute()` methods still throw a checked
> >>>>>> exception; can we remove that from the FLIP? Also the above
> mentioned
> >>>>>> `Iterator#next()` would trigger an execution without throwing a
> >>>>>> checked exception.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Timo
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#
> >>>>>> On 31.03.20 06:28, godfrey he wrote:
> >>>>>>> Hi, Timo & Jark
> >>>>>>>
> >>>>>>> Thanks for your explanation.
> >>>>>>> Agree with you that async execution should always be async,
> >>>>>>> and sync execution scenario can be covered  by async execution.
> >>>>>>> It helps provide an unified entry point for batch and streaming.
> >>>>>>> I think we can also use sync execution for some testing.
> >>>>>>> So, I agree with you that we provide `executeSql` method and it's
> >>> async
> >>>>>>> method.
> >>>>>>> If we want sync method in the future, we can add method named
> >>>>>>> `executeSqlSync`.
> >>>>>>>
> >>>>>>> I think we've reached an agreement. I will update the document, and
> >>>>>>> start
> >>>>>>> voting process.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Godfrey
> >>>>>>>
> >>>>>>>
> >>>>>>> Jark Wu<imj...@gmail.com>  于2020年3月31日周二 上午12:46写道：
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I didn't follow the full discussion.
> >>>>>>>> But I share the same concern with Timo that streaming queries
> should
> >>>>>>>> always
> >>>>>>>> be async.
> >>>>>>>> Otherwise, I can image it will cause a lot of confusion and
> problems
> >>> if
> >>>>>>>> users don't deeply keep the "sync" in mind (e.g. client hangs).
> >>>>>>>> Besides, the streaming mode is still the majority use cases of
> Flink
> >>>>>>>> and
> >>>>>>>> Flink SQL. We should put the usability at a high priority.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, 30 Mar 2020 at 23:27, Timo Walther<twal...@apache.org>
> >>> wrote:
> >>>>>>>>> Hi Godfrey,
> >>>>>>>>>
> >>>>>>>>> maybe I wasn't expressing my biggest concern enough in my last
> mail.
> >>>>>>>>> Even in a singleline and sync execution, I think that streaming
> >>>>>>>>> queries
> >>>>>>>>> should not block the execution. Otherwise it is not possible to
> call
> >>>>>>>>> collect() or print() on them afterwards.
> >>>>>>>>>
> >>>>>>>>> "there are too many things need to discuss for multiline":
> >>>>>>>>>
> >>>>>>>>> True, I don't want to solve all of them right now. But what I
> know
> >>> is
> >>>>>>>>> that our newly introduced methods should fit into a multiline
> >>>>>>>>> execution.
> >>>>>>>>> There is no big difference of calling `executeSql(A),
> >>>>>>>>> executeSql(B)` and
> >>>>>>>>> processing a multiline file `A;\nB;`.
> >>>>>>>>>
> >>>>>>>>> I think the example that you mentioned can simply be undefined
> for
> >>>>>>>>> now.
> >>>>>>>>> Currently, no catalog is modifying data but just metadata. This
> is a
> >>>>>>>>> separate discussion.
> >>>>>>>>>
> >>>>>>>>> "result of the second statement is indeterministic":
> >>>>>>>>>
> >>>>>>>>> Sure this is indeterministic. But this is the implementers fault
> >>>>>>>>> and we
> >>>>>>>>> cannot forbid such pipelines.
> >>>>>>>>>
> >>>>>>>>> How about we always execute streaming queries async? It would
> >>> unblock
> >>>>>>>>> executeSql() and multiline statements.
> >>>>>>>>>
> >>>>>>>>> Having a `executeSqlAsync()` is useful for batch. However, I
> don't
> >>>>>>>>> want
> >>>>>>>>> `sync/async` be the new batch/stream flag. The execution behavior
> >>>>>>>>> should
> >>>>>>>>> come from the query itself.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Timo
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 30.03.20 11:12, godfrey he wrote:
> >>>>>>>>>> Hi Timo,
> >>>>>>>>>>
> >>>>>>>>>> Agree with you that streaming queries is our top priority,
> >>>>>>>>>> but I think there are too many things need to discuss for
> multiline
> >>>>>>>>>> statements:
> >>>>>>>>>> e.g.
> >>>>>>>>>> 1. what's the behaivor of DDL and DML mixing for async
> execution:
> >>>>>>>>>> create table t1 xxx;
> >>>>>>>>>> create table t2 xxx;
> >>>>>>>>>> insert into t2 select * from t1 where xxx;
> >>>>>>>>>> drop table t1; // t1 may be a MySQL table, the data will also be
> >>>>>>>> deleted.
> >>>>>>>>>> t1 is dropped when "insert" job is running.
> >>>>>>>>>>
> >>>>>>>>>> 2. what's the behaivor of unified scenario for async execution:
> >>>>>>>>>> (as you
> >>>>>>>>>> mentioned)
> >>>>>>>>>> INSERT INTO t1 SELECT * FROM s;
> >>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
> >>>>>>>>>>
> >>>>>>>>>> The result of the second statement is indeterministic, because
> the
> >>>>>>>> first
> >>>>>>>>>> statement maybe is running.
> >>>>>>>>>> I think we need to put a lot of effort to define the behavior of
> >>>>>>>>> logically
> >>>>>>>>>> related queries.
> >>>>>>>>>>
> >>>>>>>>>> In this FLIP, I suggest we only handle single statement, and we
> >>> also
> >>>>>>>>>> introduce an async execute method
> >>>>>>>>>> which is more important and more often used for users.
> >>>>>>>>>>
> >>>>>>>>>> Dor the sync methods (like `TableEnvironment.executeSql` and
> >>>>>>>>>> `StatementSet.execute`),
> >>>>>>>>>> the result will be returned until the job is finished. The
> >>> following
> >>>>>>>>>> methods will be introduced in this FLIP:
> >>>>>>>>>>
> >>>>>>>>>>      /**
> >>>>>>>>>>       * Asynchronously execute the given single statement
> >>>>>>>>>>       */
> >>>>>>>>>> TableEnvironment.executeSqlAsync(String statement): TableResult
> >>>>>>>>>>
> >>>>>>>>>> /**
> >>>>>>>>>>      * Asynchronously execute the dml statements as a batch
> >>>>>>>>>>      */
> >>>>>>>>>> StatementSet.executeAsync(): TableResult
> >>>>>>>>>>
> >>>>>>>>>> public interface TableResult {
> >>>>>>>>>>        /**
> >>>>>>>>>>         * return JobClient for DQL and DML in async mode, else
> >>> return
> >>>>>>>>>> Optional.empty
> >>>>>>>>>>         */
> >>>>>>>>>>        Optional<JobClient> getJobClient();
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> what do you think?
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Godfrey
> >>>>>>>>>>
> >>>>>>>>>> Timo Walther<twal...@apache.org>  于2020年3月26日周四 下午9:15写道：
> >>>>>>>>>>
> >>>>>>>>>>> Hi Godfrey,
> >>>>>>>>>>>
> >>>>>>>>>>> executing streaming queries must be our top priority because
> this
> >>> is
> >>>>>>>>>>> what distinguishes Flink from competitors. If we change the
> >>>>>>>>>>> execution
> >>>>>>>>>>> behavior, we should think about the other cases as well to not
> >>> break
> >>>>>>>> the
> >>>>>>>>>>> API a third time.
> >>>>>>>>>>>
> >>>>>>>>>>> I fear that just having an async execute method will not be
> enough
> >>>>>>>>>>> because users should be able to mix streaming and batch queries
> >>> in a
> >>>>>>>>>>> unified scenario.
> >>>>>>>>>>>
> >>>>>>>>>>> If I remember it correctly, we had some discussions in the past
> >>>>>>>>>>> about
> >>>>>>>>>>> what decides about the execution mode of a query. Currently, we
> >>>>>>>>>>> would
> >>>>>>>>>>> like to let the query decide, not derive it from the sources.
> >>>>>>>>>>>
> >>>>>>>>>>> So I could image a multiline pipeline as:
> >>>>>>>>>>>
> >>>>>>>>>>> USE CATALOG 'mycat';
> >>>>>>>>>>> INSERT INTO t1 SELECT * FROM s;
> >>>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
> >>>>>>>>>>>
> >>>>>>>>>>> For executeMultilineSql():
> >>>>>>>>>>>
> >>>>>>>>>>> sync because regular SQL
> >>>>>>>>>>> sync because regular Batch SQL
> >>>>>>>>>>> async because Streaming SQL
> >>>>>>>>>>>
> >>>>>>>>>>> For executeAsyncMultilineSql():
> >>>>>>>>>>>
> >>>>>>>>>>> async because everything should be async
> >>>>>>>>>>> async because everything should be async
> >>>>>>>>>>> async because everything should be async
> >>>>>>>>>>>
> >>>>>>>>>>> What we should not start for executeAsyncMultilineSql():
> >>>>>>>>>>>
> >>>>>>>>>>> sync because DDL
> >>>>>>>>>>> async because everything should be async
> >>>>>>>>>>> async because everything should be async
> >>>>>>>>>>>
> >>>>>>>>>>> What are you thoughts here?
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Timo
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 26.03.20 12:50, godfrey he wrote:
> >>>>>>>>>>>> Hi Timo,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I agree with you that streaming queries mostly need async
> >>>>>>>>>>>> execution.
> >>>>>>>>>>>> In fact, our original plan is only introducing sync methods in
> >>> this
> >>>>>>>>> FLIP,
> >>>>>>>>>>>> and async methods (like "executeSqlAsync") will be introduced
> in
> >>>>>>>>>>>> the
> >>>>>>>>>>> future
> >>>>>>>>>>>> which is mentioned in the appendix.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Maybe the async methods also need to be considered in this
> FLIP.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think sync methods is also useful for streaming which can be
> >>> used
> >>>>>>>> to
> >>>>>>>>>>> run
> >>>>>>>>>>>> bounded source.
> >>>>>>>>>>>> Maybe we should check whether all sources are bounded in sync
> >>>>>>>> execution
> >>>>>>>>>>>> mode.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Also, if we block for streaming queries, we could never
> support
> >>>>>>>>>>>>> multiline files. Because the first INSERT INTO would block
> the
> >>>>>>>> further
> >>>>>>>>>>>>> execution.
> >>>>>>>>>>>> agree with you, we need async method to submit multiline
> files,
> >>>>>>>>>>>> and files should be limited that the DQL and DML should be
> >>>>>>>>>>>> always in
> >>>>>>>>> the
> >>>>>>>>>>>> end for streaming.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Godfrey
> >>>>>>>>>>>>
> >>>>>>>>>>>> Timo Walther<twal...@apache.org>  于2020年3月26日周四 下午4:29写道：
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Godfrey,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> having control over the job after submission is a requirement
> >>> that
> >>>>>>>> was
> >>>>>>>>>>>>> requested frequently (some examples are [1], [2]). Users
> would
> >>>>>>>>>>>>> like
> >>>>>>>> to
> >>>>>>>>>>>>> get insights about the running or completed job. Including
> the
> >>>>>>>> jobId,
> >>>>>>>>>>>>> jobGraph etc., the JobClient summarizes these properties.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It is good to have a discussion about
> synchronous/asynchronous
> >>>>>>>>>>>>> submission now to have a complete execution picture.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I thought we submit streaming queries mostly async and just
> >>>>>>>>>>>>> wait for
> >>>>>>>>> the
> >>>>>>>>>>>>> successful submission. If we block for streaming queries, how
> >>>>>>>>>>>>> can we
> >>>>>>>>>>>>> collect() or print() results?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Also, if we block for streaming queries, we could never
> support
> >>>>>>>>>>>>> multiline files. Because the first INSERT INTO would block
> the
> >>>>>>>> further
> >>>>>>>>>>>>> execution.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If we decide to block entirely on streaming queries, we need
> the
> >>>>>>>> async
> >>>>>>>>>>>>> execution methods in the design already. However, I would
> >>>>>>>>>>>>> rather go
> >>>>>>>>> for
> >>>>>>>>>>>>> non-blocking streaming queries. Also with the `EMIT STREAM`
> key
> >>>>>>>>>>>>> word
> >>>>>>>>> in
> >>>>>>>>>>>>> mind that we might add to SQL statements soon.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]https://issues.apache.org/jira/browse/FLINK-16761
> >>>>>>>>>>>>> [2]https://issues.apache.org/jira/browse/FLINK-12214
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 25.03.20 16:30, godfrey he wrote:
> >>>>>>>>>>>>>> Hi Timo,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for the updating.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regarding to "multiline statement support", I'm also fine
> that
> >>>>>>>>>>>>>> `TableEnvironment.executeSql()` only supports single line
> >>>>>>>> statement,
> >>>>>>>>>>> and
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>> can support multiline statement later (needs more discussion
> >>>>>>>>>>>>>> about
> >>>>>>>>>>> this).
> >>>>>>>>>>>>>> Regarding to "StatementSet.explian()", I don't have strong
> >>>>>>>>>>>>>> opinions
> >>>>>>>>>>> about
> >>>>>>>>>>>>>> that.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regarding to "TableResult.getJobClient()", I think it's
> >>>>>>>> unnecessary.
> >>>>>>>>>>> The
> >>>>>>>>>>>>>> reason is: first, many statements (e.g. DDL, show xx, use
> xx)
> >>>>>>>>>>>>>> will
> >>>>>>>>> not
> >>>>>>>>>>>>>> submit a Flink job. second, `TableEnvironment.executeSql()`
> and
> >>>>>>>>>>>>>> `StatementSet.execute()` are synchronous method,
> `TableResult`
> >>>>>>>>>>>>>> will
> >>>>>>>>> be
> >>>>>>>>>>>>>> returned only after the job is finished or failed.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regarding to "whether StatementSet.execute() needs to throw
> >>>>>>>>>>> exception", I
> >>>>>>>>>>>>>> think we should choose a unified way to tell whether the
> >>>>>>>>>>>>>> execution
> >>>>>>>> is
> >>>>>>>>>>>>>> successful. If `TableResult` contains ERROR kind
> (non-runtime
> >>>>>>>>>>> exception),
> >>>>>>>>>>>>>> users need to not only check the result but also catch the
> >>>>>>>>>>>>>> runtime
> >>>>>>>>>>>>>> exception in their code. or `StatementSet.execute()` does
> not
> >>>>>>>>>>>>>> throw
> >>>>>>>>> any
> >>>>>>>>>>>>>> exception (including runtime exception), all exception
> >>>>>>>>>>>>>> messages are
> >>>>>>>>> in
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>> result.  I prefer "StatementSet.execute() needs to throw
> >>>>>>>> exception".
> >>>>>>>>> cc
> >>>>>>>>>>>>> @Jark
> >>>>>>>>>>>>>> Wu<imj...@gmail.com>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I will update the agreed parts to the document first.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Godfrey
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Timo Walther<twal...@apache.org>  于2020年3月25日周三
> >>>>>>>>>>>>>> 下午6:51写道：
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi Godfrey,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> thanks for starting the discussion on the mailing list. And
> >>>>>>>>>>>>>>> sorry
> >>>>>>>>>>> again
> >>>>>>>>>>>>>>> for the late reply to FLIP-84. I have updated the Google
> doc
> >>> one
> >>>>>>>>> more
> >>>>>>>>>>>>>>> time to incorporate the offline discussions.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>        From Dawid's and my view, it is fine to postpone the
> >>>>>>>>>>>>>>> multiline
> >>>>>>>>>>> support
> >>>>>>>>>>>>>>> to a separate method. This can be future work even though
> we
> >>>>>>>>>>>>>>> will
> >>>>>>>>> need
> >>>>>>>>>>>>>>> it rather soon.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If there are no objections, I suggest to update the FLIP-84
> >>>>>>>>>>>>>>> again
> >>>>>>>>> and
> >>>>>>>>>>>>>>> have another voting process.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 25.03.20 11:17, godfrey he wrote:
> >>>>>>>>>>>>>>>> Hi community,
> >>>>>>>>>>>>>>>> Timo, Fabian and Dawid have some feedbacks about
> FLIP-84[1].
> >>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>> feedbacks
> >>>>>>>>>>>>>>>> are all about new introduced methods. We had a discussion
> >>>>>>>>> yesterday,
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> most of feedbacks have been agreed upon. Here is the
> >>>>>>>>>>>>>>>> conclusions:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> *1. about proposed methods in `TableEnvironment`:*
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> the original proposed methods:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> TableEnvironment.createDmlBatch(): DmlBatch
> >>>>>>>>>>>>>>>> TableEnvironment.executeStatement(String statement):
> >>>>>>>>>>>>>>>> ResultTable
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> the new proposed methods:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> // we should not use abbreviations in the API, and the
> term
> >>>>>>>> "Batch"
> >>>>>>>>>>> is
> >>>>>>>>>>>>>>>> easily confused with batch/streaming processing
> >>>>>>>>>>>>>>>> TableEnvironment.createStatementSet(): StatementSet
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> // every method that takes SQL should have `Sql` in its
> name
> >>>>>>>>>>>>>>>> // supports multiline statement ???
> >>>>>>>>>>>>>>>> TableEnvironment.executeSql(String statement): TableResult
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> // new methods. supports explaining DQL and DML
> >>>>>>>>>>>>>>>> TableEnvironment.explainSql(String statement,
> >>> ExplainDetail...
> >>>>>>>>>>>>> details):
> >>>>>>>>>>>>>>>> String
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> *2. about proposed related classes:*
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> the original proposed classes:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> interface DmlBatch {
> >>>>>>>>>>>>>>>>            void addInsert(String insert);
> >>>>>>>>>>>>>>>>            void addInsert(String targetPath, Table table);
> >>>>>>>>>>>>>>>>            ResultTable execute() throws Exception ;
> >>>>>>>>>>>>>>>>            String explain(boolean extended);
> >>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> public interface ResultTable {
> >>>>>>>>>>>>>>>>            TableSchema getResultSchema();
> >>>>>>>>>>>>>>>>            Iterable<Row> getResultRows();
> >>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> the new proposed classes:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> interface StatementSet {
> >>>>>>>>>>>>>>>>            // every method that takes SQL should have
> `Sql` in
> >>>>>>>>>>>>>>>> its
> >>>>>>>>> name
> >>>>>>>>>>>>>>>>            // return StatementSet instance for fluent
> >>> programming
> >>>>>>>>>>>>>>>>            addInsertSql(String statement): StatementSet
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>            // return StatementSet instance for fluent
> >>> programming
> >>>>>>>>>>>>>>>>            addInsert(String tablePath, Table table):
> >>> StatementSet
> >>>>>>>>>>>>>>>>            // new method. support overwrite mode
> >>>>>>>>>>>>>>>>            addInsert(String tablePath, Table table,
> boolean
> >>>>>>>>> overwrite):
> >>>>>>>>>>>>>>>> StatementSet
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>            explain(): String
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>            // new method. supports adding more details
> for the
> >>>>>>>> result
> >>>>>>>>>>>>>>>>            explain(ExplainDetail... extraDetails): String
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>            // throw exception ???
> >>>>>>>>>>>>>>>>            execute(): TableResult
> >>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> interface TableResult {
> >>>>>>>>>>>>>>>>            getTableSchema(): TableSchema
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>            // avoid custom parsing of an "OK" row in
> >>> programming
> >>>>>>>>>>>>>>>>            getResultKind(): ResultKind
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>            // instead of `get` make it explicit that this
> is
> >>>>>>>>>>>>>>>> might
> >>>>>>>> be
> >>>>>>>>>>>>>>> triggering
> >>>>>>>>>>>>>>>> an expensive operation
> >>>>>>>>>>>>>>>>            collect(): Iterable<Row>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>            // for fluent programming
> >>>>>>>>>>>>>>>>            print(): Unit
> >>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> enum ResultKind {
> >>>>>>>>>>>>>>>>            SUCCESS, // for DDL, DCL and statements with a
> >>> simple
> >>>>>>>> "OK"
> >>>>>>>>>>>>>>>>            SUCCESS_WITH_CONTENT, // rows with important
> >>>>>>>>>>>>>>>> content are
> >>>>>>>>>>>>> available
> >>>>>>>>>>>>>>>> (DML, DQL)
> >>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> *3. new proposed methods in `Table`*
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> `Table.insertInto()` will be deprecated, and the following
> >>>>>>>> methods
> >>>>>>>>>>> are
> >>>>>>>>>>>>>>>> introduced:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Table.executeInsert(String tablePath): TableResult
> >>>>>>>>>>>>>>>> Table.executeInsert(String tablePath, boolean overwrite):
> >>>>>>>>> TableResult
> >>>>>>>>>>>>>>>> Table.explain(ExplainDetail... details): String
> >>>>>>>>>>>>>>>> Table.execute(): TableResult
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> There are two issues need further discussion, one is
> whether
> >>>>>>>>>>>>>>>> `TableEnvironment.executeSql(String statement):
> TableResult`
> >>>>>>>> needs
> >>>>>>>>> to
> >>>>>>>>>>>>>>>> support multiline statement (or whether `TableEnvironment`
> >>>>>>>>>>>>>>>> needs
> >>>>>>>> to
> >>>>>>>>>>>>>>> support
> >>>>>>>>>>>>>>>> multiline statement), and another one is whether
> >>>>>>>>>>>>> `StatementSet.execute()`
> >>>>>>>>>>>>>>>> needs to throw exception.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> please refer to the feedback document [2] for the details.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Any suggestions are warmly welcomed!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>
> >>>
> https://wiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>>
> >>>
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Godfrey
> >>>>>>>>>>>>>>>>
>
>

Re: [DISCUSS] FLIP-84 Feedback Summary

Reply via email to