Re: [DISCUSS] FLIP-84 Feedback Summary

Kurt Young Wed, 01 Apr 2020 06:08:40 -0700

One comment to `executeMultilineSql`, I'm afraid sometimes user might
forget to
iterate the returned iterators, e.g. user submits a bunch of DDLs and
expect the
framework will execute them one by one. But it didn't.


Best,
Kurt


On Wed, Apr 1, 2020 at 5:10 PM Aljoscha Krettek <aljos...@apache.org> wrote:

> Agreed to what Dawid and Timo said.
>
> To answer your question about multi line SQL: no, we don't think we need
> this in Flink 1.11, we only wanted to make sure that the interfaces that
> we now put in place will potentially allow this in the future.
>
> Best,
> Aljoscha
>
> On 01.04.20 09:31, godfrey he wrote:
> > Hi, Timo & Dawid,
> >
> > Thanks so much for the effort of `multiline statements supporting`,
> > I have a few questions about this method:
> >
> > 1. users can well control the execution logic through the proposed method
> >   if they know what the statements are (a statement is a DDL, a DML or
> > others).
> > but if a statement is from a file, that means users do not know what the
> > statements are,
> > the execution behavior is unclear.
> > As a platform user, I think this method is hard to use, unless the
> platform
> > defines
> > a set of rule about the statements order, such as: no select in the
> middle,
> > dml must be at tail of sql file (which may be the most case in product
> > env).
> > Otherwise the platform must parse the sql first, then know what the
> > statements are.
> > If do like that, the platform can handle all cases through `executeSql`
> and
> > `StatementSet`.
> >
> > 2. SQL client can't also use `executeMultilineSql` to supports multiline
> > statements,
> >   because there are some special commands introduced in SQL client,
> > such as `quit`, `source`, `load jar` (not exist now, but maybe we need
> this
> > command
> >   to support dynamic table source and udf).
> > Does TableEnvironment also supports those commands?
> >
> > 3. btw, we must have this feature in release-1.11? I find there are few
> > user cases
> >   in the feedback document which behavior is unclear now.
> >
> > regarding to "change the return value from `Iterable<Row` to
> > `Iterator<Row`",
> > I couldn't agree more with this change. Just as Dawid mentioned
> > "The contract of the Iterable#iterator is that it returns a new iterator
> > each time,
> >   which effectively means we can iterate the results multiple times.",
> > we does not provide iterate the results multiple times.
> > If we want do that, the client must buffer all results. but it's
> impossible
> > for streaming job.
> >
> > Best,
> > Godfrey
> >
> > Dawid Wysakowicz <dwysakow...@apache.org> 于2020年4月1日周三 上午3:14写道：
> >
> >> Thank you Timo for the great summary! It covers (almost) all the topics.
> >> Even though in the end we are not suggesting much changes to the current
> >> state of FLIP I think it is important to lay out all possible use cases
> >> so that we do not change the execution model every release.
> >>
> >> There is one additional thing we discussed. Could we change the result
> >> type of TableResult#collect to Iterator<Row>? Even though those
> >> interfaces do not differ much. I think Iterator better describes that
> >> the results might not be materialized on the client side, but can be
> >> retrieved on a per record basis. The contract of the Iterable#iterator
> >> is that it returns a new iterator each time, which effectively means we
> >> can iterate the results multiple times. Iterating the results is not
> >> possible when we don't retrieve all the results from the cluster at
> once.
> >>
> >> I think we should also use Iterator for
> >> TableEnvironment#executeMultilineSql(String statements):
> >> Iterator<TableResult>.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 31/03/2020 19:27, Timo Walther wrote:
> >>> Hi Godfrey,
> >>>
> >>> Aljoscha, Dawid, Klou, and I had another discussion around FLIP-84. In
> >>> particular, we discussed how the current status of the FLIP and the
> >>> future requirements around multiline statements, async/sync, collect()
> >>> fit together.
> >>>
> >>> We also updated the FLIP-84 Feedback Summary document [1] with some
> >>> use cases.
> >>>
> >>> We believe that we found a good solution that also fits to what is in
> >>> the current FLIP. So no bigger changes necessary, which is great!
> >>>
> >>> Our findings were:
> >>>
> >>> 1. Async vs sync submission of Flink jobs:
> >>>
> >>> Having a blocking `execute()` in DataStream API was rather a mistake.
> >>> Instead all submissions should be async because this allows supporting
> >>> both modes if necessary. Thus, submitting all queries async sounds
> >>> good to us. If users want to run a job sync, they can use the
> >>> JobClient and wait for completion (or collect() in case of batch jobs).
> >>>
> >>> 2. Multi-statement execution:
> >>>
> >>> For the multi-statement execution, we don't see a contradication with
> >>> the async execution behavior. We imagine a method like:
> >>>
> >>> TableEnvironment#executeMultilineSql(String statements):
> >>> Iterable<TableResult>
> >>>
> >>> Where the `Iterator#next()` method would trigger the next statement
> >>> submission. This allows a caller to decide synchronously when to
> >>> submit statements async to the cluster. Thus, a service such as the
> >>> SQL Client can handle the result of each statement individually and
> >>> process statement by statement sequentially.
> >>>
> >>> 3. The role of TableResult and result retrieval in general
> >>>
> >>> `TableResult` is similar to `JobClient`. Instead of returning a
> >>> `CompletableFuture` of something, it is a concrete util class where
> >>> some methods have the behavior of completable future (e.g. collect(),
> >>> print()) and some are already completed (getTableSchema(),
> >>> getResultKind()).
> >>>
> >>> `StatementSet#execute()` returns a single `TableResult` because the
> >>> order is undefined in a set and all statements have the same schema.
> >>> Its `collect()` will return a row for each executed `INSERT INTO` in
> >>> the order of statement definition.
> >>>
> >>> For simple `SELECT * FROM ...`, the query execution might block until
> >>> `collect()` is called to pull buffered rows from the job (from
> >>> socket/REST API what ever we will use in the future). We can say that
> >>> a statement finished successfully, when the `collect#Iterator#hasNext`
> >>> has returned false.
> >>>
> >>> I hope this summarizes our discussion @Dawid/Aljoscha/Klou?
> >>>
> >>> It would be great if we can add these findings to the FLIP before we
> >>> start voting.
> >>>
> >>> One minor thing: some `execute()` methods still throw a checked
> >>> exception; can we remove that from the FLIP? Also the above mentioned
> >>> `Iterator#next()` would trigger an execution without throwing a
> >>> checked exception.
> >>>
> >>> Thanks,
> >>> Timo
> >>>
> >>> [1]
> >>>
> >>
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#
> >>>
> >>> On 31.03.20 06:28, godfrey he wrote:
> >>>> Hi, Timo & Jark
> >>>>
> >>>> Thanks for your explanation.
> >>>> Agree with you that async execution should always be async,
> >>>> and sync execution scenario can be covered  by async execution.
> >>>> It helps provide an unified entry point for batch and streaming.
> >>>> I think we can also use sync execution for some testing.
> >>>> So, I agree with you that we provide `executeSql` method and it's
> async
> >>>> method.
> >>>> If we want sync method in the future, we can add method named
> >>>> `executeSqlSync`.
> >>>>
> >>>> I think we've reached an agreement. I will update the document, and
> >>>> start
> >>>> voting process.
> >>>>
> >>>> Best,
> >>>> Godfrey
> >>>>
> >>>>
> >>>> Jark Wu <imj...@gmail.com> 于2020年3月31日周二 上午12:46写道：
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I didn't follow the full discussion.
> >>>>> But I share the same concern with Timo that streaming queries should
> >>>>> always
> >>>>> be async.
> >>>>> Otherwise, I can image it will cause a lot of confusion and problems
> if
> >>>>> users don't deeply keep the "sync" in mind (e.g. client hangs).
> >>>>> Besides, the streaming mode is still the majority use cases of Flink
> >>>>> and
> >>>>> Flink SQL. We should put the usability at a high priority.
> >>>>>
> >>>>> Best,
> >>>>> Jark
> >>>>>
> >>>>>
> >>>>> On Mon, 30 Mar 2020 at 23:27, Timo Walther <twal...@apache.org>
> wrote:
> >>>>>
> >>>>>> Hi Godfrey,
> >>>>>>
> >>>>>> maybe I wasn't expressing my biggest concern enough in my last mail.
> >>>>>> Even in a singleline and sync execution, I think that streaming
> >>>>>> queries
> >>>>>> should not block the execution. Otherwise it is not possible to call
> >>>>>> collect() or print() on them afterwards.
> >>>>>>
> >>>>>> "there are too many things need to discuss for multiline":
> >>>>>>
> >>>>>> True, I don't want to solve all of them right now. But what I know
> is
> >>>>>> that our newly introduced methods should fit into a multiline
> >>>>>> execution.
> >>>>>> There is no big difference of calling `executeSql(A),
> >>>>>> executeSql(B)` and
> >>>>>> processing a multiline file `A;\nB;`.
> >>>>>>
> >>>>>> I think the example that you mentioned can simply be undefined for
> >>>>>> now.
> >>>>>> Currently, no catalog is modifying data but just metadata. This is a
> >>>>>> separate discussion.
> >>>>>>
> >>>>>> "result of the second statement is indeterministic":
> >>>>>>
> >>>>>> Sure this is indeterministic. But this is the implementers fault
> >>>>>> and we
> >>>>>> cannot forbid such pipelines.
> >>>>>>
> >>>>>> How about we always execute streaming queries async? It would
> unblock
> >>>>>> executeSql() and multiline statements.
> >>>>>>
> >>>>>> Having a `executeSqlAsync()` is useful for batch. However, I don't
> >>>>>> want
> >>>>>> `sync/async` be the new batch/stream flag. The execution behavior
> >>>>>> should
> >>>>>> come from the query itself.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Timo
> >>>>>>
> >>>>>>
> >>>>>> On 30.03.20 11:12, godfrey he wrote:
> >>>>>>> Hi Timo,
> >>>>>>>
> >>>>>>> Agree with you that streaming queries is our top priority,
> >>>>>>> but I think there are too many things need to discuss for multiline
> >>>>>>> statements:
> >>>>>>> e.g.
> >>>>>>> 1. what's the behaivor of DDL and DML mixing for async execution:
> >>>>>>> create table t1 xxx;
> >>>>>>> create table t2 xxx;
> >>>>>>> insert into t2 select * from t1 where xxx;
> >>>>>>> drop table t1; // t1 may be a MySQL table, the data will also be
> >>>>> deleted.
> >>>>>>>
> >>>>>>> t1 is dropped when "insert" job is running.
> >>>>>>>
> >>>>>>> 2. what's the behaivor of unified scenario for async execution:
> >>>>>>> (as you
> >>>>>>> mentioned)
> >>>>>>> INSERT INTO t1 SELECT * FROM s;
> >>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
> >>>>>>>
> >>>>>>> The result of the second statement is indeterministic, because the
> >>>>> first
> >>>>>>> statement maybe is running.
> >>>>>>> I think we need to put a lot of effort to define the behavior of
> >>>>>> logically
> >>>>>>> related queries.
> >>>>>>>
> >>>>>>> In this FLIP, I suggest we only handle single statement, and we
> also
> >>>>>>> introduce an async execute method
> >>>>>>> which is more important and more often used for users.
> >>>>>>>
> >>>>>>> Dor the sync methods (like `TableEnvironment.executeSql` and
> >>>>>>> `StatementSet.execute`),
> >>>>>>> the result will be returned until the job is finished. The
> following
> >>>>>>> methods will be introduced in this FLIP:
> >>>>>>>
> >>>>>>>     /**
> >>>>>>>      * Asynchronously execute the given single statement
> >>>>>>>      */
> >>>>>>> TableEnvironment.executeSqlAsync(String statement): TableResult
> >>>>>>>
> >>>>>>> /**
> >>>>>>>     * Asynchronously execute the dml statements as a batch
> >>>>>>>     */
> >>>>>>> StatementSet.executeAsync(): TableResult
> >>>>>>>
> >>>>>>> public interface TableResult {
> >>>>>>>       /**
> >>>>>>>        * return JobClient for DQL and DML in async mode, else
> return
> >>>>>>> Optional.empty
> >>>>>>>        */
> >>>>>>>       Optional<JobClient> getJobClient();
> >>>>>>> }
> >>>>>>>
> >>>>>>> what do you think?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Godfrey
> >>>>>>>
> >>>>>>> Timo Walther <twal...@apache.org> 于2020年3月26日周四 下午9:15写道：
> >>>>>>>
> >>>>>>>> Hi Godfrey,
> >>>>>>>>
> >>>>>>>> executing streaming queries must be our top priority because this
> is
> >>>>>>>> what distinguishes Flink from competitors. If we change the
> >>>>>>>> execution
> >>>>>>>> behavior, we should think about the other cases as well to not
> break
> >>>>> the
> >>>>>>>> API a third time.
> >>>>>>>>
> >>>>>>>> I fear that just having an async execute method will not be enough
> >>>>>>>> because users should be able to mix streaming and batch queries
> in a
> >>>>>>>> unified scenario.
> >>>>>>>>
> >>>>>>>> If I remember it correctly, we had some discussions in the past
> >>>>>>>> about
> >>>>>>>> what decides about the execution mode of a query. Currently, we
> >>>>>>>> would
> >>>>>>>> like to let the query decide, not derive it from the sources.
> >>>>>>>>
> >>>>>>>> So I could image a multiline pipeline as:
> >>>>>>>>
> >>>>>>>> USE CATALOG 'mycat';
> >>>>>>>> INSERT INTO t1 SELECT * FROM s;
> >>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
> >>>>>>>>
> >>>>>>>> For executeMultilineSql():
> >>>>>>>>
> >>>>>>>> sync because regular SQL
> >>>>>>>> sync because regular Batch SQL
> >>>>>>>> async because Streaming SQL
> >>>>>>>>
> >>>>>>>> For executeAsyncMultilineSql():
> >>>>>>>>
> >>>>>>>> async because everything should be async
> >>>>>>>> async because everything should be async
> >>>>>>>> async because everything should be async
> >>>>>>>>
> >>>>>>>> What we should not start for executeAsyncMultilineSql():
> >>>>>>>>
> >>>>>>>> sync because DDL
> >>>>>>>> async because everything should be async
> >>>>>>>> async because everything should be async
> >>>>>>>>
> >>>>>>>> What are you thoughts here?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Timo
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 26.03.20 12:50, godfrey he wrote:
> >>>>>>>>> Hi Timo,
> >>>>>>>>>
> >>>>>>>>> I agree with you that streaming queries mostly need async
> >>>>>>>>> execution.
> >>>>>>>>> In fact, our original plan is only introducing sync methods in
> this
> >>>>>> FLIP,
> >>>>>>>>> and async methods (like "executeSqlAsync") will be introduced in
> >>>>>>>>> the
> >>>>>>>> future
> >>>>>>>>> which is mentioned in the appendix.
> >>>>>>>>>
> >>>>>>>>> Maybe the async methods also need to be considered in this FLIP.
> >>>>>>>>>
> >>>>>>>>> I think sync methods is also useful for streaming which can be
> used
> >>>>> to
> >>>>>>>> run
> >>>>>>>>> bounded source.
> >>>>>>>>> Maybe we should check whether all sources are bounded in sync
> >>>>> execution
> >>>>>>>>> mode.
> >>>>>>>>>
> >>>>>>>>>> Also, if we block for streaming queries, we could never support
> >>>>>>>>>> multiline files. Because the first INSERT INTO would block the
> >>>>> further
> >>>>>>>>>> execution.
> >>>>>>>>> agree with you, we need async method to submit multiline files,
> >>>>>>>>> and files should be limited that the DQL and DML should be
> >>>>>>>>> always in
> >>>>>> the
> >>>>>>>>> end for streaming.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Godfrey
> >>>>>>>>>
> >>>>>>>>> Timo Walther <twal...@apache.org> 于2020年3月26日周四 下午4:29写道：
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Hi Godfrey,
> >>>>>>>>>>
> >>>>>>>>>> having control over the job after submission is a requirement
> that
> >>>>> was
> >>>>>>>>>> requested frequently (some examples are [1], [2]). Users would
> >>>>>>>>>> like
> >>>>> to
> >>>>>>>>>> get insights about the running or completed job. Including the
> >>>>> jobId,
> >>>>>>>>>> jobGraph etc., the JobClient summarizes these properties.
> >>>>>>>>>>
> >>>>>>>>>> It is good to have a discussion about synchronous/asynchronous
> >>>>>>>>>> submission now to have a complete execution picture.
> >>>>>>>>>>
> >>>>>>>>>> I thought we submit streaming queries mostly async and just
> >>>>>>>>>> wait for
> >>>>>> the
> >>>>>>>>>> successful submission. If we block for streaming queries, how
> >>>>>>>>>> can we
> >>>>>>>>>> collect() or print() results?
> >>>>>>>>>>
> >>>>>>>>>> Also, if we block for streaming queries, we could never support
> >>>>>>>>>> multiline files. Because the first INSERT INTO would block the
> >>>>> further
> >>>>>>>>>> execution.
> >>>>>>>>>>
> >>>>>>>>>> If we decide to block entirely on streaming queries, we need the
> >>>>> async
> >>>>>>>>>> execution methods in the design already. However, I would
> >>>>>>>>>> rather go
> >>>>>> for
> >>>>>>>>>> non-blocking streaming queries. Also with the `EMIT STREAM` key
> >>>>>>>>>> word
> >>>>>> in
> >>>>>>>>>> mind that we might add to SQL statements soon.
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Timo
> >>>>>>>>>>
> >>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-16761
> >>>>>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-12214
> >>>>>>>>>>
> >>>>>>>>>> On 25.03.20 16:30, godfrey he wrote:
> >>>>>>>>>>> Hi Timo,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for the updating.
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding to "multiline statement support", I'm also fine that
> >>>>>>>>>>> `TableEnvironment.executeSql()` only supports single line
> >>>>> statement,
> >>>>>>>> and
> >>>>>>>>>> we
> >>>>>>>>>>> can support multiline statement later (needs more discussion
> >>>>>>>>>>> about
> >>>>>>>> this).
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding to "StatementSet.explian()", I don't have strong
> >>>>>>>>>>> opinions
> >>>>>>>> about
> >>>>>>>>>>> that.
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding to "TableResult.getJobClient()", I think it's
> >>>>> unnecessary.
> >>>>>>>> The
> >>>>>>>>>>> reason is: first, many statements (e.g. DDL, show xx, use xx)
> >>>>>>>>>>> will
> >>>>>> not
> >>>>>>>>>>> submit a Flink job. second, `TableEnvironment.executeSql()` and
> >>>>>>>>>>> `StatementSet.execute()` are synchronous method, `TableResult`
> >>>>>>>>>>> will
> >>>>>> be
> >>>>>>>>>>> returned only after the job is finished or failed.
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding to "whether StatementSet.execute() needs to throw
> >>>>>>>> exception", I
> >>>>>>>>>>> think we should choose a unified way to tell whether the
> >>>>>>>>>>> execution
> >>>>> is
> >>>>>>>>>>> successful. If `TableResult` contains ERROR kind (non-runtime
> >>>>>>>> exception),
> >>>>>>>>>>> users need to not only check the result but also catch the
> >>>>>>>>>>> runtime
> >>>>>>>>>>> exception in their code. or `StatementSet.execute()` does not
> >>>>>>>>>>> throw
> >>>>>> any
> >>>>>>>>>>> exception (including runtime exception), all exception
> >>>>>>>>>>> messages are
> >>>>>> in
> >>>>>>>>>> the
> >>>>>>>>>>> result.  I prefer "StatementSet.execute() needs to throw
> >>>>> exception".
> >>>>>> cc
> >>>>>>>>>> @Jark
> >>>>>>>>>>> Wu <imj...@gmail.com>
> >>>>>>>>>>>
> >>>>>>>>>>> I will update the agreed parts to the document first.
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Godfrey
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Timo Walther <twal...@apache.org> 于2020年3月25日周三
> >>>>>>>>>>> 下午6:51写道：
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi Godfrey,
> >>>>>>>>>>>>
> >>>>>>>>>>>> thanks for starting the discussion on the mailing list. And
> >>>>>>>>>>>> sorry
> >>>>>>>> again
> >>>>>>>>>>>> for the late reply to FLIP-84. I have updated the Google doc
> one
> >>>>>> more
> >>>>>>>>>>>> time to incorporate the offline discussions.
> >>>>>>>>>>>>
> >>>>>>>>>>>>       From Dawid's and my view, it is fine to postpone the
> >>>>>>>>>>>> multiline
> >>>>>>>> support
> >>>>>>>>>>>> to a separate method. This can be future work even though we
> >>>>>>>>>>>> will
> >>>>>> need
> >>>>>>>>>>>> it rather soon.
> >>>>>>>>>>>>
> >>>>>>>>>>>> If there are no objections, I suggest to update the FLIP-84
> >>>>>>>>>>>> again
> >>>>>> and
> >>>>>>>>>>>> have another voting process.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Timo
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 25.03.20 11:17, godfrey he wrote:
> >>>>>>>>>>>>> Hi community,
> >>>>>>>>>>>>> Timo, Fabian and Dawid have some feedbacks about FLIP-84[1].
> >>>>>>>>>>>>> The
> >>>>>>>>>>>> feedbacks
> >>>>>>>>>>>>> are all about new introduced methods. We had a discussion
> >>>>>> yesterday,
> >>>>>>>>>> and
> >>>>>>>>>>>>> most of feedbacks have been agreed upon. Here is the
> >>>>>>>>>>>>> conclusions:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> *1. about proposed methods in `TableEnvironment`:*
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> the original proposed methods:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> TableEnvironment.createDmlBatch(): DmlBatch
> >>>>>>>>>>>>> TableEnvironment.executeStatement(String statement):
> >>>>>>>>>>>>> ResultTable
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> the new proposed methods:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> // we should not use abbreviations in the API, and the term
> >>>>> "Batch"
> >>>>>>>> is
> >>>>>>>>>>>>> easily confused with batch/streaming processing
> >>>>>>>>>>>>> TableEnvironment.createStatementSet(): StatementSet
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> // every method that takes SQL should have `Sql` in its name
> >>>>>>>>>>>>> // supports multiline statement ???
> >>>>>>>>>>>>> TableEnvironment.executeSql(String statement): TableResult
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> // new methods. supports explaining DQL and DML
> >>>>>>>>>>>>> TableEnvironment.explainSql(String statement,
> ExplainDetail...
> >>>>>>>>>> details):
> >>>>>>>>>>>>> String
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> *2. about proposed related classes:*
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> the original proposed classes:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> interface DmlBatch {
> >>>>>>>>>>>>>           void addInsert(String insert);
> >>>>>>>>>>>>>           void addInsert(String targetPath, Table table);
> >>>>>>>>>>>>>           ResultTable execute() throws Exception ;
> >>>>>>>>>>>>>           String explain(boolean extended);
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> public interface ResultTable {
> >>>>>>>>>>>>>           TableSchema getResultSchema();
> >>>>>>>>>>>>>           Iterable<Row> getResultRows();
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> the new proposed classes:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> interface StatementSet {
> >>>>>>>>>>>>>           // every method that takes SQL should have `Sql` in
> >>>>>>>>>>>>> its
> >>>>>> name
> >>>>>>>>>>>>>           // return StatementSet instance for fluent
> programming
> >>>>>>>>>>>>>           addInsertSql(String statement): StatementSet
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>           // return StatementSet instance for fluent
> programming
> >>>>>>>>>>>>>           addInsert(String tablePath, Table table):
> StatementSet
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>           // new method. support overwrite mode
> >>>>>>>>>>>>>           addInsert(String tablePath, Table table, boolean
> >>>>>> overwrite):
> >>>>>>>>>>>>> StatementSet
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>           explain(): String
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>           // new method. supports adding more details for the
> >>>>> result
> >>>>>>>>>>>>>           explain(ExplainDetail... extraDetails): String
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>           // throw exception ???
> >>>>>>>>>>>>>           execute(): TableResult
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> interface TableResult {
> >>>>>>>>>>>>>           getTableSchema(): TableSchema
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>           // avoid custom parsing of an "OK" row in
> programming
> >>>>>>>>>>>>>           getResultKind(): ResultKind
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>           // instead of `get` make it explicit that this is
> >>>>>>>>>>>>> might
> >>>>> be
> >>>>>>>>>>>> triggering
> >>>>>>>>>>>>> an expensive operation
> >>>>>>>>>>>>>           collect(): Iterable<Row>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>           // for fluent programming
> >>>>>>>>>>>>>           print(): Unit
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> enum ResultKind {
> >>>>>>>>>>>>>           SUCCESS, // for DDL, DCL and statements with a
> simple
> >>>>> "OK"
> >>>>>>>>>>>>>           SUCCESS_WITH_CONTENT, // rows with important
> >>>>>>>>>>>>> content are
> >>>>>>>>>> available
> >>>>>>>>>>>>> (DML, DQL)
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> *3. new proposed methods in `Table`*
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> `Table.insertInto()` will be deprecated, and the following
> >>>>> methods
> >>>>>>>> are
> >>>>>>>>>>>>> introduced:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Table.executeInsert(String tablePath): TableResult
> >>>>>>>>>>>>> Table.executeInsert(String tablePath, boolean overwrite):
> >>>>>> TableResult
> >>>>>>>>>>>>> Table.explain(ExplainDetail... details): String
> >>>>>>>>>>>>> Table.execute(): TableResult
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> There are two issues need further discussion, one is whether
> >>>>>>>>>>>>> `TableEnvironment.executeSql(String statement): TableResult`
> >>>>> needs
> >>>>>> to
> >>>>>>>>>>>>> support multiline statement (or whether `TableEnvironment`
> >>>>>>>>>>>>> needs
> >>>>> to
> >>>>>>>>>>>> support
> >>>>>>>>>>>>> multiline statement), and another one is whether
> >>>>>>>>>> `StatementSet.execute()`
> >>>>>>>>>>>>> needs to throw exception.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> please refer to the feedback document [2] for the details.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Any suggestions are warmly welcomed!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>
> https://wiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>
> >>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit
> >>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Godfrey
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>

Re: [DISCUSS] FLIP-84 Feedback Summary

Reply via email to