Re: [DISCUSS] FLIP-84 Feedback Summary

Dawid Wysakowicz Wed, 01 Apr 2020 06:36:39 -0700

When considering the multi-line support I think it is helpful to start
with a use case in mind. In my opinion consumers of this method will be:


 1. sql-client
 2. third-part sql based platforms

@Godfrey As for the quit/source/... commands. I think those belong to
the responsibility of aforementioned. I think they should not be
understandable by the TableEnvironment. What would quit on a
TableEnvironment do? Moreover I think such commands should be prefixed
appropriately. I think it's a common practice to e.g. prefix those with
! or : to say they are meta commands of the tool rather than a query.

I also don't necessarily understand why platform users need to know the
kind of the query to use the proposed method. They should get the type
from the TableResult#ResultKind. If the ResultKind is SUCCESS, it was a
DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If that's not enough
we can enrich the TableResult with more explicit kind of query, but so
far I don't see such a need.

@Kurt In those cases I would assume the developers want to present
results of the queries anyway. Moreover I think it is safe to assume
they can adhere to such a contract that the results must be iterated.

For direct users of TableEnvironment/Table API this method does not make
much sense anyway, in my opinion. I think we can rather safely assume in
this scenario they do not want to submit multiple queries at a single time.

Best,

Dawid


On 01/04/2020 15:07, Kurt Young wrote:
> One comment to `executeMultilineSql`, I'm afraid sometimes user might
> forget to
> iterate the returned iterators, e.g. user submits a bunch of DDLs and
> expect the
> framework will execute them one by one. But it didn't.
>
> Best,
> Kurt
>
>
> On Wed, Apr 1, 2020 at 5:10 PM Aljoscha Krettek <aljos...@apache.org> wrote:
>
>> Agreed to what Dawid and Timo said.
>>
>> To answer your question about multi line SQL: no, we don't think we need
>> this in Flink 1.11, we only wanted to make sure that the interfaces that
>> we now put in place will potentially allow this in the future.
>>
>> Best,
>> Aljoscha
>>
>> On 01.04.20 09:31, godfrey he wrote:
>>> Hi, Timo & Dawid,
>>>
>>> Thanks so much for the effort of `multiline statements supporting`,
>>> I have a few questions about this method:
>>>
>>> 1. users can well control the execution logic through the proposed method
>>>   if they know what the statements are (a statement is a DDL, a DML or
>>> others).
>>> but if a statement is from a file, that means users do not know what the
>>> statements are,
>>> the execution behavior is unclear.
>>> As a platform user, I think this method is hard to use, unless the
>> platform
>>> defines
>>> a set of rule about the statements order, such as: no select in the
>> middle,
>>> dml must be at tail of sql file (which may be the most case in product
>>> env).
>>> Otherwise the platform must parse the sql first, then know what the
>>> statements are.
>>> If do like that, the platform can handle all cases through `executeSql`
>> and
>>> `StatementSet`.
>>>
>>> 2. SQL client can't also use `executeMultilineSql` to supports multiline
>>> statements,
>>>   because there are some special commands introduced in SQL client,
>>> such as `quit`, `source`, `load jar` (not exist now, but maybe we need
>> this
>>> command
>>>   to support dynamic table source and udf).
>>> Does TableEnvironment also supports those commands?
>>>
>>> 3. btw, we must have this feature in release-1.11? I find there are few
>>> user cases
>>>   in the feedback document which behavior is unclear now.
>>>
>>> regarding to "change the return value from `Iterable<Row` to
>>> `Iterator<Row`",
>>> I couldn't agree more with this change. Just as Dawid mentioned
>>> "The contract of the Iterable#iterator is that it returns a new iterator
>>> each time,
>>>   which effectively means we can iterate the results multiple times.",
>>> we does not provide iterate the results multiple times.
>>> If we want do that, the client must buffer all results. but it's
>> impossible
>>> for streaming job.
>>>
>>> Best,
>>> Godfrey
>>>
>>> Dawid Wysakowicz <dwysakow...@apache.org> 于2020年4月1日周三 上午3:14写道：
>>>
>>>> Thank you Timo for the great summary! It covers (almost) all the topics.
>>>> Even though in the end we are not suggesting much changes to the current
>>>> state of FLIP I think it is important to lay out all possible use cases
>>>> so that we do not change the execution model every release.
>>>>
>>>> There is one additional thing we discussed. Could we change the result
>>>> type of TableResult#collect to Iterator<Row>? Even though those
>>>> interfaces do not differ much. I think Iterator better describes that
>>>> the results might not be materialized on the client side, but can be
>>>> retrieved on a per record basis. The contract of the Iterable#iterator
>>>> is that it returns a new iterator each time, which effectively means we
>>>> can iterate the results multiple times. Iterating the results is not
>>>> possible when we don't retrieve all the results from the cluster at
>> once.
>>>> I think we should also use Iterator for
>>>> TableEnvironment#executeMultilineSql(String statements):
>>>> Iterator<TableResult>.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>> On 31/03/2020 19:27, Timo Walther wrote:
>>>>> Hi Godfrey,
>>>>>
>>>>> Aljoscha, Dawid, Klou, and I had another discussion around FLIP-84. In
>>>>> particular, we discussed how the current status of the FLIP and the
>>>>> future requirements around multiline statements, async/sync, collect()
>>>>> fit together.
>>>>>
>>>>> We also updated the FLIP-84 Feedback Summary document [1] with some
>>>>> use cases.
>>>>>
>>>>> We believe that we found a good solution that also fits to what is in
>>>>> the current FLIP. So no bigger changes necessary, which is great!
>>>>>
>>>>> Our findings were:
>>>>>
>>>>> 1. Async vs sync submission of Flink jobs:
>>>>>
>>>>> Having a blocking `execute()` in DataStream API was rather a mistake.
>>>>> Instead all submissions should be async because this allows supporting
>>>>> both modes if necessary. Thus, submitting all queries async sounds
>>>>> good to us. If users want to run a job sync, they can use the
>>>>> JobClient and wait for completion (or collect() in case of batch jobs).
>>>>>
>>>>> 2. Multi-statement execution:
>>>>>
>>>>> For the multi-statement execution, we don't see a contradication with
>>>>> the async execution behavior. We imagine a method like:
>>>>>
>>>>> TableEnvironment#executeMultilineSql(String statements):
>>>>> Iterable<TableResult>
>>>>>
>>>>> Where the `Iterator#next()` method would trigger the next statement
>>>>> submission. This allows a caller to decide synchronously when to
>>>>> submit statements async to the cluster. Thus, a service such as the
>>>>> SQL Client can handle the result of each statement individually and
>>>>> process statement by statement sequentially.
>>>>>
>>>>> 3. The role of TableResult and result retrieval in general
>>>>>
>>>>> `TableResult` is similar to `JobClient`. Instead of returning a
>>>>> `CompletableFuture` of something, it is a concrete util class where
>>>>> some methods have the behavior of completable future (e.g. collect(),
>>>>> print()) and some are already completed (getTableSchema(),
>>>>> getResultKind()).
>>>>>
>>>>> `StatementSet#execute()` returns a single `TableResult` because the
>>>>> order is undefined in a set and all statements have the same schema.
>>>>> Its `collect()` will return a row for each executed `INSERT INTO` in
>>>>> the order of statement definition.
>>>>>
>>>>> For simple `SELECT * FROM ...`, the query execution might block until
>>>>> `collect()` is called to pull buffered rows from the job (from
>>>>> socket/REST API what ever we will use in the future). We can say that
>>>>> a statement finished successfully, when the `collect#Iterator#hasNext`
>>>>> has returned false.
>>>>>
>>>>> I hope this summarizes our discussion @Dawid/Aljoscha/Klou?
>>>>>
>>>>> It would be great if we can add these findings to the FLIP before we
>>>>> start voting.
>>>>>
>>>>> One minor thing: some `execute()` methods still throw a checked
>>>>> exception; can we remove that from the FLIP? Also the above mentioned
>>>>> `Iterator#next()` would trigger an execution without throwing a
>>>>> checked exception.
>>>>>
>>>>> Thanks,
>>>>> Timo
>>>>>
>>>>> [1]
>>>>>
>> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#
>>>>> On 31.03.20 06:28, godfrey he wrote:
>>>>>> Hi, Timo & Jark
>>>>>>
>>>>>> Thanks for your explanation.
>>>>>> Agree with you that async execution should always be async,
>>>>>> and sync execution scenario can be covered  by async execution.
>>>>>> It helps provide an unified entry point for batch and streaming.
>>>>>> I think we can also use sync execution for some testing.
>>>>>> So, I agree with you that we provide `executeSql` method and it's
>> async
>>>>>> method.
>>>>>> If we want sync method in the future, we can add method named
>>>>>> `executeSqlSync`.
>>>>>>
>>>>>> I think we've reached an agreement. I will update the document, and
>>>>>> start
>>>>>> voting process.
>>>>>>
>>>>>> Best,
>>>>>> Godfrey
>>>>>>
>>>>>>
>>>>>> Jark Wu <imj...@gmail.com> 于2020年3月31日周二 上午12:46写道：
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I didn't follow the full discussion.
>>>>>>> But I share the same concern with Timo that streaming queries should
>>>>>>> always
>>>>>>> be async.
>>>>>>> Otherwise, I can image it will cause a lot of confusion and problems
>> if
>>>>>>> users don't deeply keep the "sync" in mind (e.g. client hangs).
>>>>>>> Besides, the streaming mode is still the majority use cases of Flink
>>>>>>> and
>>>>>>> Flink SQL. We should put the usability at a high priority.
>>>>>>>
>>>>>>> Best,
>>>>>>> Jark
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 30 Mar 2020 at 23:27, Timo Walther <twal...@apache.org>
>> wrote:
>>>>>>>> Hi Godfrey,
>>>>>>>>
>>>>>>>> maybe I wasn't expressing my biggest concern enough in my last mail.
>>>>>>>> Even in a singleline and sync execution, I think that streaming
>>>>>>>> queries
>>>>>>>> should not block the execution. Otherwise it is not possible to call
>>>>>>>> collect() or print() on them afterwards.
>>>>>>>>
>>>>>>>> "there are too many things need to discuss for multiline":
>>>>>>>>
>>>>>>>> True, I don't want to solve all of them right now. But what I know
>> is
>>>>>>>> that our newly introduced methods should fit into a multiline
>>>>>>>> execution.
>>>>>>>> There is no big difference of calling `executeSql(A),
>>>>>>>> executeSql(B)` and
>>>>>>>> processing a multiline file `A;\nB;`.
>>>>>>>>
>>>>>>>> I think the example that you mentioned can simply be undefined for
>>>>>>>> now.
>>>>>>>> Currently, no catalog is modifying data but just metadata. This is a
>>>>>>>> separate discussion.
>>>>>>>>
>>>>>>>> "result of the second statement is indeterministic":
>>>>>>>>
>>>>>>>> Sure this is indeterministic. But this is the implementers fault
>>>>>>>> and we
>>>>>>>> cannot forbid such pipelines.
>>>>>>>>
>>>>>>>> How about we always execute streaming queries async? It would
>> unblock
>>>>>>>> executeSql() and multiline statements.
>>>>>>>>
>>>>>>>> Having a `executeSqlAsync()` is useful for batch. However, I don't
>>>>>>>> want
>>>>>>>> `sync/async` be the new batch/stream flag. The execution behavior
>>>>>>>> should
>>>>>>>> come from the query itself.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Timo
>>>>>>>>
>>>>>>>>
>>>>>>>> On 30.03.20 11:12, godfrey he wrote:
>>>>>>>>> Hi Timo,
>>>>>>>>>
>>>>>>>>> Agree with you that streaming queries is our top priority,
>>>>>>>>> but I think there are too many things need to discuss for multiline
>>>>>>>>> statements:
>>>>>>>>> e.g.
>>>>>>>>> 1. what's the behaivor of DDL and DML mixing for async execution:
>>>>>>>>> create table t1 xxx;
>>>>>>>>> create table t2 xxx;
>>>>>>>>> insert into t2 select * from t1 where xxx;
>>>>>>>>> drop table t1; // t1 may be a MySQL table, the data will also be
>>>>>>> deleted.
>>>>>>>>> t1 is dropped when "insert" job is running.
>>>>>>>>>
>>>>>>>>> 2. what's the behaivor of unified scenario for async execution:
>>>>>>>>> (as you
>>>>>>>>> mentioned)
>>>>>>>>> INSERT INTO t1 SELECT * FROM s;
>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
>>>>>>>>>
>>>>>>>>> The result of the second statement is indeterministic, because the
>>>>>>> first
>>>>>>>>> statement maybe is running.
>>>>>>>>> I think we need to put a lot of effort to define the behavior of
>>>>>>>> logically
>>>>>>>>> related queries.
>>>>>>>>>
>>>>>>>>> In this FLIP, I suggest we only handle single statement, and we
>> also
>>>>>>>>> introduce an async execute method
>>>>>>>>> which is more important and more often used for users.
>>>>>>>>>
>>>>>>>>> Dor the sync methods (like `TableEnvironment.executeSql` and
>>>>>>>>> `StatementSet.execute`),
>>>>>>>>> the result will be returned until the job is finished. The
>> following
>>>>>>>>> methods will be introduced in this FLIP:
>>>>>>>>>
>>>>>>>>>     /**
>>>>>>>>>      * Asynchronously execute the given single statement
>>>>>>>>>      */
>>>>>>>>> TableEnvironment.executeSqlAsync(String statement): TableResult
>>>>>>>>>
>>>>>>>>> /**
>>>>>>>>>     * Asynchronously execute the dml statements as a batch
>>>>>>>>>     */
>>>>>>>>> StatementSet.executeAsync(): TableResult
>>>>>>>>>
>>>>>>>>> public interface TableResult {
>>>>>>>>>       /**
>>>>>>>>>        * return JobClient for DQL and DML in async mode, else
>> return
>>>>>>>>> Optional.empty
>>>>>>>>>        */
>>>>>>>>>       Optional<JobClient> getJobClient();
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> what do you think?
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Godfrey
>>>>>>>>>
>>>>>>>>> Timo Walther <twal...@apache.org> 于2020年3月26日周四 下午9:15写道：
>>>>>>>>>
>>>>>>>>>> Hi Godfrey,
>>>>>>>>>>
>>>>>>>>>> executing streaming queries must be our top priority because this
>> is
>>>>>>>>>> what distinguishes Flink from competitors. If we change the
>>>>>>>>>> execution
>>>>>>>>>> behavior, we should think about the other cases as well to not
>> break
>>>>>>> the
>>>>>>>>>> API a third time.
>>>>>>>>>>
>>>>>>>>>> I fear that just having an async execute method will not be enough
>>>>>>>>>> because users should be able to mix streaming and batch queries
>> in a
>>>>>>>>>> unified scenario.
>>>>>>>>>>
>>>>>>>>>> If I remember it correctly, we had some discussions in the past
>>>>>>>>>> about
>>>>>>>>>> what decides about the execution mode of a query. Currently, we
>>>>>>>>>> would
>>>>>>>>>> like to let the query decide, not derive it from the sources.
>>>>>>>>>>
>>>>>>>>>> So I could image a multiline pipeline as:
>>>>>>>>>>
>>>>>>>>>> USE CATALOG 'mycat';
>>>>>>>>>> INSERT INTO t1 SELECT * FROM s;
>>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM;
>>>>>>>>>>
>>>>>>>>>> For executeMultilineSql():
>>>>>>>>>>
>>>>>>>>>> sync because regular SQL
>>>>>>>>>> sync because regular Batch SQL
>>>>>>>>>> async because Streaming SQL
>>>>>>>>>>
>>>>>>>>>> For executeAsyncMultilineSql():
>>>>>>>>>>
>>>>>>>>>> async because everything should be async
>>>>>>>>>> async because everything should be async
>>>>>>>>>> async because everything should be async
>>>>>>>>>>
>>>>>>>>>> What we should not start for executeAsyncMultilineSql():
>>>>>>>>>>
>>>>>>>>>> sync because DDL
>>>>>>>>>> async because everything should be async
>>>>>>>>>> async because everything should be async
>>>>>>>>>>
>>>>>>>>>> What are you thoughts here?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Timo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 26.03.20 12:50, godfrey he wrote:
>>>>>>>>>>> Hi Timo,
>>>>>>>>>>>
>>>>>>>>>>> I agree with you that streaming queries mostly need async
>>>>>>>>>>> execution.
>>>>>>>>>>> In fact, our original plan is only introducing sync methods in
>> this
>>>>>>>> FLIP,
>>>>>>>>>>> and async methods (like "executeSqlAsync") will be introduced in
>>>>>>>>>>> the
>>>>>>>>>> future
>>>>>>>>>>> which is mentioned in the appendix.
>>>>>>>>>>>
>>>>>>>>>>> Maybe the async methods also need to be considered in this FLIP.
>>>>>>>>>>>
>>>>>>>>>>> I think sync methods is also useful for streaming which can be
>> used
>>>>>>> to
>>>>>>>>>> run
>>>>>>>>>>> bounded source.
>>>>>>>>>>> Maybe we should check whether all sources are bounded in sync
>>>>>>> execution
>>>>>>>>>>> mode.
>>>>>>>>>>>
>>>>>>>>>>>> Also, if we block for streaming queries, we could never support
>>>>>>>>>>>> multiline files. Because the first INSERT INTO would block the
>>>>>>> further
>>>>>>>>>>>> execution.
>>>>>>>>>>> agree with you, we need async method to submit multiline files,
>>>>>>>>>>> and files should be limited that the DQL and DML should be
>>>>>>>>>>> always in
>>>>>>>> the
>>>>>>>>>>> end for streaming.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Godfrey
>>>>>>>>>>>
>>>>>>>>>>> Timo Walther <twal...@apache.org> 于2020年3月26日周四 下午4:29写道：
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Hi Godfrey,
>>>>>>>>>>>>
>>>>>>>>>>>> having control over the job after submission is a requirement
>> that
>>>>>>> was
>>>>>>>>>>>> requested frequently (some examples are [1], [2]). Users would
>>>>>>>>>>>> like
>>>>>>> to
>>>>>>>>>>>> get insights about the running or completed job. Including the
>>>>>>> jobId,
>>>>>>>>>>>> jobGraph etc., the JobClient summarizes these properties.
>>>>>>>>>>>>
>>>>>>>>>>>> It is good to have a discussion about synchronous/asynchronous
>>>>>>>>>>>> submission now to have a complete execution picture.
>>>>>>>>>>>>
>>>>>>>>>>>> I thought we submit streaming queries mostly async and just
>>>>>>>>>>>> wait for
>>>>>>>> the
>>>>>>>>>>>> successful submission. If we block for streaming queries, how
>>>>>>>>>>>> can we
>>>>>>>>>>>> collect() or print() results?
>>>>>>>>>>>>
>>>>>>>>>>>> Also, if we block for streaming queries, we could never support
>>>>>>>>>>>> multiline files. Because the first INSERT INTO would block the
>>>>>>> further
>>>>>>>>>>>> execution.
>>>>>>>>>>>>
>>>>>>>>>>>> If we decide to block entirely on streaming queries, we need the
>>>>>>> async
>>>>>>>>>>>> execution methods in the design already. However, I would
>>>>>>>>>>>> rather go
>>>>>>>> for
>>>>>>>>>>>> non-blocking streaming queries. Also with the `EMIT STREAM` key
>>>>>>>>>>>> word
>>>>>>>> in
>>>>>>>>>>>> mind that we might add to SQL statements soon.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Timo
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-16761
>>>>>>>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-12214
>>>>>>>>>>>>
>>>>>>>>>>>> On 25.03.20 16:30, godfrey he wrote:
>>>>>>>>>>>>> Hi Timo,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the updating.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regarding to "multiline statement support", I'm also fine that
>>>>>>>>>>>>> `TableEnvironment.executeSql()` only supports single line
>>>>>>> statement,
>>>>>>>>>> and
>>>>>>>>>>>> we
>>>>>>>>>>>>> can support multiline statement later (needs more discussion
>>>>>>>>>>>>> about
>>>>>>>>>> this).
>>>>>>>>>>>>> Regarding to "StatementSet.explian()", I don't have strong
>>>>>>>>>>>>> opinions
>>>>>>>>>> about
>>>>>>>>>>>>> that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regarding to "TableResult.getJobClient()", I think it's
>>>>>>> unnecessary.
>>>>>>>>>> The
>>>>>>>>>>>>> reason is: first, many statements (e.g. DDL, show xx, use xx)
>>>>>>>>>>>>> will
>>>>>>>> not
>>>>>>>>>>>>> submit a Flink job. second, `TableEnvironment.executeSql()` and
>>>>>>>>>>>>> `StatementSet.execute()` are synchronous method, `TableResult`
>>>>>>>>>>>>> will
>>>>>>>> be
>>>>>>>>>>>>> returned only after the job is finished or failed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regarding to "whether StatementSet.execute() needs to throw
>>>>>>>>>> exception", I
>>>>>>>>>>>>> think we should choose a unified way to tell whether the
>>>>>>>>>>>>> execution
>>>>>>> is
>>>>>>>>>>>>> successful. If `TableResult` contains ERROR kind (non-runtime
>>>>>>>>>> exception),
>>>>>>>>>>>>> users need to not only check the result but also catch the
>>>>>>>>>>>>> runtime
>>>>>>>>>>>>> exception in their code. or `StatementSet.execute()` does not
>>>>>>>>>>>>> throw
>>>>>>>> any
>>>>>>>>>>>>> exception (including runtime exception), all exception
>>>>>>>>>>>>> messages are
>>>>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>>> result.  I prefer "StatementSet.execute() needs to throw
>>>>>>> exception".
>>>>>>>> cc
>>>>>>>>>>>> @Jark
>>>>>>>>>>>>> Wu <imj...@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will update the agreed parts to the document first.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Godfrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Timo Walther <twal...@apache.org> 于2020年3月25日周三
>>>>>>>>>>>>> 下午6:51写道：
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Godfrey,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thanks for starting the discussion on the mailing list. And
>>>>>>>>>>>>>> sorry
>>>>>>>>>> again
>>>>>>>>>>>>>> for the late reply to FLIP-84. I have updated the Google doc
>> one
>>>>>>>> more
>>>>>>>>>>>>>> time to incorporate the offline discussions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       From Dawid's and my view, it is fine to postpone the
>>>>>>>>>>>>>> multiline
>>>>>>>>>> support
>>>>>>>>>>>>>> to a separate method. This can be future work even though we
>>>>>>>>>>>>>> will
>>>>>>>> need
>>>>>>>>>>>>>> it rather soon.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If there are no objections, I suggest to update the FLIP-84
>>>>>>>>>>>>>> again
>>>>>>>> and
>>>>>>>>>>>>>> have another voting process.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 25.03.20 11:17, godfrey he wrote:
>>>>>>>>>>>>>>> Hi community,
>>>>>>>>>>>>>>> Timo, Fabian and Dawid have some feedbacks about FLIP-84[1].
>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>> feedbacks
>>>>>>>>>>>>>>> are all about new introduced methods. We had a discussion
>>>>>>>> yesterday,
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> most of feedbacks have been agreed upon. Here is the
>>>>>>>>>>>>>>> conclusions:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *1. about proposed methods in `TableEnvironment`:*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> the original proposed methods:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> TableEnvironment.createDmlBatch(): DmlBatch
>>>>>>>>>>>>>>> TableEnvironment.executeStatement(String statement):
>>>>>>>>>>>>>>> ResultTable
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> the new proposed methods:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> // we should not use abbreviations in the API, and the term
>>>>>>> "Batch"
>>>>>>>>>> is
>>>>>>>>>>>>>>> easily confused with batch/streaming processing
>>>>>>>>>>>>>>> TableEnvironment.createStatementSet(): StatementSet
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> // every method that takes SQL should have `Sql` in its name
>>>>>>>>>>>>>>> // supports multiline statement ???
>>>>>>>>>>>>>>> TableEnvironment.executeSql(String statement): TableResult
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> // new methods. supports explaining DQL and DML
>>>>>>>>>>>>>>> TableEnvironment.explainSql(String statement,
>> ExplainDetail...
>>>>>>>>>>>> details):
>>>>>>>>>>>>>>> String
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *2. about proposed related classes:*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> the original proposed classes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> interface DmlBatch {
>>>>>>>>>>>>>>>           void addInsert(String insert);
>>>>>>>>>>>>>>>           void addInsert(String targetPath, Table table);
>>>>>>>>>>>>>>>           ResultTable execute() throws Exception ;
>>>>>>>>>>>>>>>           String explain(boolean extended);
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> public interface ResultTable {
>>>>>>>>>>>>>>>           TableSchema getResultSchema();
>>>>>>>>>>>>>>>           Iterable<Row> getResultRows();
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> the new proposed classes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> interface StatementSet {
>>>>>>>>>>>>>>>           // every method that takes SQL should have `Sql` in
>>>>>>>>>>>>>>> its
>>>>>>>> name
>>>>>>>>>>>>>>>           // return StatementSet instance for fluent
>> programming
>>>>>>>>>>>>>>>           addInsertSql(String statement): StatementSet
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           // return StatementSet instance for fluent
>> programming
>>>>>>>>>>>>>>>           addInsert(String tablePath, Table table):
>> StatementSet
>>>>>>>>>>>>>>>           // new method. support overwrite mode
>>>>>>>>>>>>>>>           addInsert(String tablePath, Table table, boolean
>>>>>>>> overwrite):
>>>>>>>>>>>>>>> StatementSet
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           explain(): String
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           // new method. supports adding more details for the
>>>>>>> result
>>>>>>>>>>>>>>>           explain(ExplainDetail... extraDetails): String
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           // throw exception ???
>>>>>>>>>>>>>>>           execute(): TableResult
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> interface TableResult {
>>>>>>>>>>>>>>>           getTableSchema(): TableSchema
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           // avoid custom parsing of an "OK" row in
>> programming
>>>>>>>>>>>>>>>           getResultKind(): ResultKind
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           // instead of `get` make it explicit that this is
>>>>>>>>>>>>>>> might
>>>>>>> be
>>>>>>>>>>>>>> triggering
>>>>>>>>>>>>>>> an expensive operation
>>>>>>>>>>>>>>>           collect(): Iterable<Row>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           // for fluent programming
>>>>>>>>>>>>>>>           print(): Unit
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> enum ResultKind {
>>>>>>>>>>>>>>>           SUCCESS, // for DDL, DCL and statements with a
>> simple
>>>>>>> "OK"
>>>>>>>>>>>>>>>           SUCCESS_WITH_CONTENT, // rows with important
>>>>>>>>>>>>>>> content are
>>>>>>>>>>>> available
>>>>>>>>>>>>>>> (DML, DQL)
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *3. new proposed methods in `Table`*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> `Table.insertInto()` will be deprecated, and the following
>>>>>>> methods
>>>>>>>>>> are
>>>>>>>>>>>>>>> introduced:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Table.executeInsert(String tablePath): TableResult
>>>>>>>>>>>>>>> Table.executeInsert(String tablePath, boolean overwrite):
>>>>>>>> TableResult
>>>>>>>>>>>>>>> Table.explain(ExplainDetail... details): String
>>>>>>>>>>>>>>> Table.execute(): TableResult
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There are two issues need further discussion, one is whether
>>>>>>>>>>>>>>> `TableEnvironment.executeSql(String statement): TableResult`
>>>>>>> needs
>>>>>>>> to
>>>>>>>>>>>>>>> support multiline statement (or whether `TableEnvironment`
>>>>>>>>>>>>>>> needs
>>>>>>> to
>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>> multiline statement), and another one is whether
>>>>>>>>>>>> `StatementSet.execute()`
>>>>>>>>>>>>>>> needs to throw exception.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> please refer to the feedback document [2] for the details.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any suggestions are warmly welcomed!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>
>> https://wiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>
>> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Godfrey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] FLIP-84 Feedback Summary

Reply via email to