When considering the multi-line support I think it is helpful to start with a use case in mind. In my opinion consumers of this method will be:
1. sql-client 2. third-part sql based platforms @Godfrey As for the quit/source/... commands. I think those belong to the responsibility of aforementioned. I think they should not be understandable by the TableEnvironment. What would quit on a TableEnvironment do? Moreover I think such commands should be prefixed appropriately. I think it's a common practice to e.g. prefix those with ! or : to say they are meta commands of the tool rather than a query. I also don't necessarily understand why platform users need to know the kind of the query to use the proposed method. They should get the type from the TableResult#ResultKind. If the ResultKind is SUCCESS, it was a DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If that's not enough we can enrich the TableResult with more explicit kind of query, but so far I don't see such a need. @Kurt In those cases I would assume the developers want to present results of the queries anyway. Moreover I think it is safe to assume they can adhere to such a contract that the results must be iterated. For direct users of TableEnvironment/Table API this method does not make much sense anyway, in my opinion. I think we can rather safely assume in this scenario they do not want to submit multiple queries at a single time. Best, Dawid On 01/04/2020 15:07, Kurt Young wrote: > One comment to `executeMultilineSql`, I'm afraid sometimes user might > forget to > iterate the returned iterators, e.g. user submits a bunch of DDLs and > expect the > framework will execute them one by one. But it didn't. > > Best, > Kurt > > > On Wed, Apr 1, 2020 at 5:10 PM Aljoscha Krettek <aljos...@apache.org> wrote: > >> Agreed to what Dawid and Timo said. >> >> To answer your question about multi line SQL: no, we don't think we need >> this in Flink 1.11, we only wanted to make sure that the interfaces that >> we now put in place will potentially allow this in the future. >> >> Best, >> Aljoscha >> >> On 01.04.20 09:31, godfrey he wrote: >>> Hi, Timo & Dawid, >>> >>> Thanks so much for the effort of `multiline statements supporting`, >>> I have a few questions about this method: >>> >>> 1. users can well control the execution logic through the proposed method >>> if they know what the statements are (a statement is a DDL, a DML or >>> others). >>> but if a statement is from a file, that means users do not know what the >>> statements are, >>> the execution behavior is unclear. >>> As a platform user, I think this method is hard to use, unless the >> platform >>> defines >>> a set of rule about the statements order, such as: no select in the >> middle, >>> dml must be at tail of sql file (which may be the most case in product >>> env). >>> Otherwise the platform must parse the sql first, then know what the >>> statements are. >>> If do like that, the platform can handle all cases through `executeSql` >> and >>> `StatementSet`. >>> >>> 2. SQL client can't also use `executeMultilineSql` to supports multiline >>> statements, >>> because there are some special commands introduced in SQL client, >>> such as `quit`, `source`, `load jar` (not exist now, but maybe we need >> this >>> command >>> to support dynamic table source and udf). >>> Does TableEnvironment also supports those commands? >>> >>> 3. btw, we must have this feature in release-1.11? I find there are few >>> user cases >>> in the feedback document which behavior is unclear now. >>> >>> regarding to "change the return value from `Iterable<Row` to >>> `Iterator<Row`", >>> I couldn't agree more with this change. Just as Dawid mentioned >>> "The contract of the Iterable#iterator is that it returns a new iterator >>> each time, >>> which effectively means we can iterate the results multiple times.", >>> we does not provide iterate the results multiple times. >>> If we want do that, the client must buffer all results. but it's >> impossible >>> for streaming job. >>> >>> Best, >>> Godfrey >>> >>> Dawid Wysakowicz <dwysakow...@apache.org> 于2020年4月1日周三 上午3:14写道: >>> >>>> Thank you Timo for the great summary! It covers (almost) all the topics. >>>> Even though in the end we are not suggesting much changes to the current >>>> state of FLIP I think it is important to lay out all possible use cases >>>> so that we do not change the execution model every release. >>>> >>>> There is one additional thing we discussed. Could we change the result >>>> type of TableResult#collect to Iterator<Row>? Even though those >>>> interfaces do not differ much. I think Iterator better describes that >>>> the results might not be materialized on the client side, but can be >>>> retrieved on a per record basis. The contract of the Iterable#iterator >>>> is that it returns a new iterator each time, which effectively means we >>>> can iterate the results multiple times. Iterating the results is not >>>> possible when we don't retrieve all the results from the cluster at >> once. >>>> I think we should also use Iterator for >>>> TableEnvironment#executeMultilineSql(String statements): >>>> Iterator<TableResult>. >>>> >>>> Best, >>>> >>>> Dawid >>>> >>>> On 31/03/2020 19:27, Timo Walther wrote: >>>>> Hi Godfrey, >>>>> >>>>> Aljoscha, Dawid, Klou, and I had another discussion around FLIP-84. In >>>>> particular, we discussed how the current status of the FLIP and the >>>>> future requirements around multiline statements, async/sync, collect() >>>>> fit together. >>>>> >>>>> We also updated the FLIP-84 Feedback Summary document [1] with some >>>>> use cases. >>>>> >>>>> We believe that we found a good solution that also fits to what is in >>>>> the current FLIP. So no bigger changes necessary, which is great! >>>>> >>>>> Our findings were: >>>>> >>>>> 1. Async vs sync submission of Flink jobs: >>>>> >>>>> Having a blocking `execute()` in DataStream API was rather a mistake. >>>>> Instead all submissions should be async because this allows supporting >>>>> both modes if necessary. Thus, submitting all queries async sounds >>>>> good to us. If users want to run a job sync, they can use the >>>>> JobClient and wait for completion (or collect() in case of batch jobs). >>>>> >>>>> 2. Multi-statement execution: >>>>> >>>>> For the multi-statement execution, we don't see a contradication with >>>>> the async execution behavior. We imagine a method like: >>>>> >>>>> TableEnvironment#executeMultilineSql(String statements): >>>>> Iterable<TableResult> >>>>> >>>>> Where the `Iterator#next()` method would trigger the next statement >>>>> submission. This allows a caller to decide synchronously when to >>>>> submit statements async to the cluster. Thus, a service such as the >>>>> SQL Client can handle the result of each statement individually and >>>>> process statement by statement sequentially. >>>>> >>>>> 3. The role of TableResult and result retrieval in general >>>>> >>>>> `TableResult` is similar to `JobClient`. Instead of returning a >>>>> `CompletableFuture` of something, it is a concrete util class where >>>>> some methods have the behavior of completable future (e.g. collect(), >>>>> print()) and some are already completed (getTableSchema(), >>>>> getResultKind()). >>>>> >>>>> `StatementSet#execute()` returns a single `TableResult` because the >>>>> order is undefined in a set and all statements have the same schema. >>>>> Its `collect()` will return a row for each executed `INSERT INTO` in >>>>> the order of statement definition. >>>>> >>>>> For simple `SELECT * FROM ...`, the query execution might block until >>>>> `collect()` is called to pull buffered rows from the job (from >>>>> socket/REST API what ever we will use in the future). We can say that >>>>> a statement finished successfully, when the `collect#Iterator#hasNext` >>>>> has returned false. >>>>> >>>>> I hope this summarizes our discussion @Dawid/Aljoscha/Klou? >>>>> >>>>> It would be great if we can add these findings to the FLIP before we >>>>> start voting. >>>>> >>>>> One minor thing: some `execute()` methods still throw a checked >>>>> exception; can we remove that from the FLIP? Also the above mentioned >>>>> `Iterator#next()` would trigger an execution without throwing a >>>>> checked exception. >>>>> >>>>> Thanks, >>>>> Timo >>>>> >>>>> [1] >>>>> >> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit# >>>>> On 31.03.20 06:28, godfrey he wrote: >>>>>> Hi, Timo & Jark >>>>>> >>>>>> Thanks for your explanation. >>>>>> Agree with you that async execution should always be async, >>>>>> and sync execution scenario can be covered by async execution. >>>>>> It helps provide an unified entry point for batch and streaming. >>>>>> I think we can also use sync execution for some testing. >>>>>> So, I agree with you that we provide `executeSql` method and it's >> async >>>>>> method. >>>>>> If we want sync method in the future, we can add method named >>>>>> `executeSqlSync`. >>>>>> >>>>>> I think we've reached an agreement. I will update the document, and >>>>>> start >>>>>> voting process. >>>>>> >>>>>> Best, >>>>>> Godfrey >>>>>> >>>>>> >>>>>> Jark Wu <imj...@gmail.com> 于2020年3月31日周二 上午12:46写道: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I didn't follow the full discussion. >>>>>>> But I share the same concern with Timo that streaming queries should >>>>>>> always >>>>>>> be async. >>>>>>> Otherwise, I can image it will cause a lot of confusion and problems >> if >>>>>>> users don't deeply keep the "sync" in mind (e.g. client hangs). >>>>>>> Besides, the streaming mode is still the majority use cases of Flink >>>>>>> and >>>>>>> Flink SQL. We should put the usability at a high priority. >>>>>>> >>>>>>> Best, >>>>>>> Jark >>>>>>> >>>>>>> >>>>>>> On Mon, 30 Mar 2020 at 23:27, Timo Walther <twal...@apache.org> >> wrote: >>>>>>>> Hi Godfrey, >>>>>>>> >>>>>>>> maybe I wasn't expressing my biggest concern enough in my last mail. >>>>>>>> Even in a singleline and sync execution, I think that streaming >>>>>>>> queries >>>>>>>> should not block the execution. Otherwise it is not possible to call >>>>>>>> collect() or print() on them afterwards. >>>>>>>> >>>>>>>> "there are too many things need to discuss for multiline": >>>>>>>> >>>>>>>> True, I don't want to solve all of them right now. But what I know >> is >>>>>>>> that our newly introduced methods should fit into a multiline >>>>>>>> execution. >>>>>>>> There is no big difference of calling `executeSql(A), >>>>>>>> executeSql(B)` and >>>>>>>> processing a multiline file `A;\nB;`. >>>>>>>> >>>>>>>> I think the example that you mentioned can simply be undefined for >>>>>>>> now. >>>>>>>> Currently, no catalog is modifying data but just metadata. This is a >>>>>>>> separate discussion. >>>>>>>> >>>>>>>> "result of the second statement is indeterministic": >>>>>>>> >>>>>>>> Sure this is indeterministic. But this is the implementers fault >>>>>>>> and we >>>>>>>> cannot forbid such pipelines. >>>>>>>> >>>>>>>> How about we always execute streaming queries async? It would >> unblock >>>>>>>> executeSql() and multiline statements. >>>>>>>> >>>>>>>> Having a `executeSqlAsync()` is useful for batch. However, I don't >>>>>>>> want >>>>>>>> `sync/async` be the new batch/stream flag. The execution behavior >>>>>>>> should >>>>>>>> come from the query itself. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Timo >>>>>>>> >>>>>>>> >>>>>>>> On 30.03.20 11:12, godfrey he wrote: >>>>>>>>> Hi Timo, >>>>>>>>> >>>>>>>>> Agree with you that streaming queries is our top priority, >>>>>>>>> but I think there are too many things need to discuss for multiline >>>>>>>>> statements: >>>>>>>>> e.g. >>>>>>>>> 1. what's the behaivor of DDL and DML mixing for async execution: >>>>>>>>> create table t1 xxx; >>>>>>>>> create table t2 xxx; >>>>>>>>> insert into t2 select * from t1 where xxx; >>>>>>>>> drop table t1; // t1 may be a MySQL table, the data will also be >>>>>>> deleted. >>>>>>>>> t1 is dropped when "insert" job is running. >>>>>>>>> >>>>>>>>> 2. what's the behaivor of unified scenario for async execution: >>>>>>>>> (as you >>>>>>>>> mentioned) >>>>>>>>> INSERT INTO t1 SELECT * FROM s; >>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM; >>>>>>>>> >>>>>>>>> The result of the second statement is indeterministic, because the >>>>>>> first >>>>>>>>> statement maybe is running. >>>>>>>>> I think we need to put a lot of effort to define the behavior of >>>>>>>> logically >>>>>>>>> related queries. >>>>>>>>> >>>>>>>>> In this FLIP, I suggest we only handle single statement, and we >> also >>>>>>>>> introduce an async execute method >>>>>>>>> which is more important and more often used for users. >>>>>>>>> >>>>>>>>> Dor the sync methods (like `TableEnvironment.executeSql` and >>>>>>>>> `StatementSet.execute`), >>>>>>>>> the result will be returned until the job is finished. The >> following >>>>>>>>> methods will be introduced in this FLIP: >>>>>>>>> >>>>>>>>> /** >>>>>>>>> * Asynchronously execute the given single statement >>>>>>>>> */ >>>>>>>>> TableEnvironment.executeSqlAsync(String statement): TableResult >>>>>>>>> >>>>>>>>> /** >>>>>>>>> * Asynchronously execute the dml statements as a batch >>>>>>>>> */ >>>>>>>>> StatementSet.executeAsync(): TableResult >>>>>>>>> >>>>>>>>> public interface TableResult { >>>>>>>>> /** >>>>>>>>> * return JobClient for DQL and DML in async mode, else >> return >>>>>>>>> Optional.empty >>>>>>>>> */ >>>>>>>>> Optional<JobClient> getJobClient(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> what do you think? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Godfrey >>>>>>>>> >>>>>>>>> Timo Walther <twal...@apache.org> 于2020年3月26日周四 下午9:15写道: >>>>>>>>> >>>>>>>>>> Hi Godfrey, >>>>>>>>>> >>>>>>>>>> executing streaming queries must be our top priority because this >> is >>>>>>>>>> what distinguishes Flink from competitors. If we change the >>>>>>>>>> execution >>>>>>>>>> behavior, we should think about the other cases as well to not >> break >>>>>>> the >>>>>>>>>> API a third time. >>>>>>>>>> >>>>>>>>>> I fear that just having an async execute method will not be enough >>>>>>>>>> because users should be able to mix streaming and batch queries >> in a >>>>>>>>>> unified scenario. >>>>>>>>>> >>>>>>>>>> If I remember it correctly, we had some discussions in the past >>>>>>>>>> about >>>>>>>>>> what decides about the execution mode of a query. Currently, we >>>>>>>>>> would >>>>>>>>>> like to let the query decide, not derive it from the sources. >>>>>>>>>> >>>>>>>>>> So I could image a multiline pipeline as: >>>>>>>>>> >>>>>>>>>> USE CATALOG 'mycat'; >>>>>>>>>> INSERT INTO t1 SELECT * FROM s; >>>>>>>>>> INSERT INTO t2 SELECT * FROM s JOIN t1 EMIT STREAM; >>>>>>>>>> >>>>>>>>>> For executeMultilineSql(): >>>>>>>>>> >>>>>>>>>> sync because regular SQL >>>>>>>>>> sync because regular Batch SQL >>>>>>>>>> async because Streaming SQL >>>>>>>>>> >>>>>>>>>> For executeAsyncMultilineSql(): >>>>>>>>>> >>>>>>>>>> async because everything should be async >>>>>>>>>> async because everything should be async >>>>>>>>>> async because everything should be async >>>>>>>>>> >>>>>>>>>> What we should not start for executeAsyncMultilineSql(): >>>>>>>>>> >>>>>>>>>> sync because DDL >>>>>>>>>> async because everything should be async >>>>>>>>>> async because everything should be async >>>>>>>>>> >>>>>>>>>> What are you thoughts here? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Timo >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 26.03.20 12:50, godfrey he wrote: >>>>>>>>>>> Hi Timo, >>>>>>>>>>> >>>>>>>>>>> I agree with you that streaming queries mostly need async >>>>>>>>>>> execution. >>>>>>>>>>> In fact, our original plan is only introducing sync methods in >> this >>>>>>>> FLIP, >>>>>>>>>>> and async methods (like "executeSqlAsync") will be introduced in >>>>>>>>>>> the >>>>>>>>>> future >>>>>>>>>>> which is mentioned in the appendix. >>>>>>>>>>> >>>>>>>>>>> Maybe the async methods also need to be considered in this FLIP. >>>>>>>>>>> >>>>>>>>>>> I think sync methods is also useful for streaming which can be >> used >>>>>>> to >>>>>>>>>> run >>>>>>>>>>> bounded source. >>>>>>>>>>> Maybe we should check whether all sources are bounded in sync >>>>>>> execution >>>>>>>>>>> mode. >>>>>>>>>>> >>>>>>>>>>>> Also, if we block for streaming queries, we could never support >>>>>>>>>>>> multiline files. Because the first INSERT INTO would block the >>>>>>> further >>>>>>>>>>>> execution. >>>>>>>>>>> agree with you, we need async method to submit multiline files, >>>>>>>>>>> and files should be limited that the DQL and DML should be >>>>>>>>>>> always in >>>>>>>> the >>>>>>>>>>> end for streaming. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Godfrey >>>>>>>>>>> >>>>>>>>>>> Timo Walther <twal...@apache.org> 于2020年3月26日周四 下午4:29写道: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Hi Godfrey, >>>>>>>>>>>> >>>>>>>>>>>> having control over the job after submission is a requirement >> that >>>>>>> was >>>>>>>>>>>> requested frequently (some examples are [1], [2]). Users would >>>>>>>>>>>> like >>>>>>> to >>>>>>>>>>>> get insights about the running or completed job. Including the >>>>>>> jobId, >>>>>>>>>>>> jobGraph etc., the JobClient summarizes these properties. >>>>>>>>>>>> >>>>>>>>>>>> It is good to have a discussion about synchronous/asynchronous >>>>>>>>>>>> submission now to have a complete execution picture. >>>>>>>>>>>> >>>>>>>>>>>> I thought we submit streaming queries mostly async and just >>>>>>>>>>>> wait for >>>>>>>> the >>>>>>>>>>>> successful submission. If we block for streaming queries, how >>>>>>>>>>>> can we >>>>>>>>>>>> collect() or print() results? >>>>>>>>>>>> >>>>>>>>>>>> Also, if we block for streaming queries, we could never support >>>>>>>>>>>> multiline files. Because the first INSERT INTO would block the >>>>>>> further >>>>>>>>>>>> execution. >>>>>>>>>>>> >>>>>>>>>>>> If we decide to block entirely on streaming queries, we need the >>>>>>> async >>>>>>>>>>>> execution methods in the design already. However, I would >>>>>>>>>>>> rather go >>>>>>>> for >>>>>>>>>>>> non-blocking streaming queries. Also with the `EMIT STREAM` key >>>>>>>>>>>> word >>>>>>>> in >>>>>>>>>>>> mind that we might add to SQL statements soon. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Timo >>>>>>>>>>>> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-16761 >>>>>>>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-12214 >>>>>>>>>>>> >>>>>>>>>>>> On 25.03.20 16:30, godfrey he wrote: >>>>>>>>>>>>> Hi Timo, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the updating. >>>>>>>>>>>>> >>>>>>>>>>>>> Regarding to "multiline statement support", I'm also fine that >>>>>>>>>>>>> `TableEnvironment.executeSql()` only supports single line >>>>>>> statement, >>>>>>>>>> and >>>>>>>>>>>> we >>>>>>>>>>>>> can support multiline statement later (needs more discussion >>>>>>>>>>>>> about >>>>>>>>>> this). >>>>>>>>>>>>> Regarding to "StatementSet.explian()", I don't have strong >>>>>>>>>>>>> opinions >>>>>>>>>> about >>>>>>>>>>>>> that. >>>>>>>>>>>>> >>>>>>>>>>>>> Regarding to "TableResult.getJobClient()", I think it's >>>>>>> unnecessary. >>>>>>>>>> The >>>>>>>>>>>>> reason is: first, many statements (e.g. DDL, show xx, use xx) >>>>>>>>>>>>> will >>>>>>>> not >>>>>>>>>>>>> submit a Flink job. second, `TableEnvironment.executeSql()` and >>>>>>>>>>>>> `StatementSet.execute()` are synchronous method, `TableResult` >>>>>>>>>>>>> will >>>>>>>> be >>>>>>>>>>>>> returned only after the job is finished or failed. >>>>>>>>>>>>> >>>>>>>>>>>>> Regarding to "whether StatementSet.execute() needs to throw >>>>>>>>>> exception", I >>>>>>>>>>>>> think we should choose a unified way to tell whether the >>>>>>>>>>>>> execution >>>>>>> is >>>>>>>>>>>>> successful. If `TableResult` contains ERROR kind (non-runtime >>>>>>>>>> exception), >>>>>>>>>>>>> users need to not only check the result but also catch the >>>>>>>>>>>>> runtime >>>>>>>>>>>>> exception in their code. or `StatementSet.execute()` does not >>>>>>>>>>>>> throw >>>>>>>> any >>>>>>>>>>>>> exception (including runtime exception), all exception >>>>>>>>>>>>> messages are >>>>>>>> in >>>>>>>>>>>> the >>>>>>>>>>>>> result. I prefer "StatementSet.execute() needs to throw >>>>>>> exception". >>>>>>>> cc >>>>>>>>>>>> @Jark >>>>>>>>>>>>> Wu <imj...@gmail.com> >>>>>>>>>>>>> >>>>>>>>>>>>> I will update the agreed parts to the document first. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Godfrey >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Timo Walther <twal...@apache.org> 于2020年3月25日周三 >>>>>>>>>>>>> 下午6:51写道: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Godfrey, >>>>>>>>>>>>>> >>>>>>>>>>>>>> thanks for starting the discussion on the mailing list. And >>>>>>>>>>>>>> sorry >>>>>>>>>> again >>>>>>>>>>>>>> for the late reply to FLIP-84. I have updated the Google doc >> one >>>>>>>> more >>>>>>>>>>>>>> time to incorporate the offline discussions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> From Dawid's and my view, it is fine to postpone the >>>>>>>>>>>>>> multiline >>>>>>>>>> support >>>>>>>>>>>>>> to a separate method. This can be future work even though we >>>>>>>>>>>>>> will >>>>>>>> need >>>>>>>>>>>>>> it rather soon. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If there are no objections, I suggest to update the FLIP-84 >>>>>>>>>>>>>> again >>>>>>>> and >>>>>>>>>>>>>> have another voting process. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Timo >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 25.03.20 11:17, godfrey he wrote: >>>>>>>>>>>>>>> Hi community, >>>>>>>>>>>>>>> Timo, Fabian and Dawid have some feedbacks about FLIP-84[1]. >>>>>>>>>>>>>>> The >>>>>>>>>>>>>> feedbacks >>>>>>>>>>>>>>> are all about new introduced methods. We had a discussion >>>>>>>> yesterday, >>>>>>>>>>>> and >>>>>>>>>>>>>>> most of feedbacks have been agreed upon. Here is the >>>>>>>>>>>>>>> conclusions: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *1. about proposed methods in `TableEnvironment`:* >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> the original proposed methods: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> TableEnvironment.createDmlBatch(): DmlBatch >>>>>>>>>>>>>>> TableEnvironment.executeStatement(String statement): >>>>>>>>>>>>>>> ResultTable >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> the new proposed methods: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // we should not use abbreviations in the API, and the term >>>>>>> "Batch" >>>>>>>>>> is >>>>>>>>>>>>>>> easily confused with batch/streaming processing >>>>>>>>>>>>>>> TableEnvironment.createStatementSet(): StatementSet >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // every method that takes SQL should have `Sql` in its name >>>>>>>>>>>>>>> // supports multiline statement ??? >>>>>>>>>>>>>>> TableEnvironment.executeSql(String statement): TableResult >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // new methods. supports explaining DQL and DML >>>>>>>>>>>>>>> TableEnvironment.explainSql(String statement, >> ExplainDetail... >>>>>>>>>>>> details): >>>>>>>>>>>>>>> String >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *2. about proposed related classes:* >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> the original proposed classes: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> interface DmlBatch { >>>>>>>>>>>>>>> void addInsert(String insert); >>>>>>>>>>>>>>> void addInsert(String targetPath, Table table); >>>>>>>>>>>>>>> ResultTable execute() throws Exception ; >>>>>>>>>>>>>>> String explain(boolean extended); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> public interface ResultTable { >>>>>>>>>>>>>>> TableSchema getResultSchema(); >>>>>>>>>>>>>>> Iterable<Row> getResultRows(); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> the new proposed classes: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> interface StatementSet { >>>>>>>>>>>>>>> // every method that takes SQL should have `Sql` in >>>>>>>>>>>>>>> its >>>>>>>> name >>>>>>>>>>>>>>> // return StatementSet instance for fluent >> programming >>>>>>>>>>>>>>> addInsertSql(String statement): StatementSet >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // return StatementSet instance for fluent >> programming >>>>>>>>>>>>>>> addInsert(String tablePath, Table table): >> StatementSet >>>>>>>>>>>>>>> // new method. support overwrite mode >>>>>>>>>>>>>>> addInsert(String tablePath, Table table, boolean >>>>>>>> overwrite): >>>>>>>>>>>>>>> StatementSet >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> explain(): String >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // new method. supports adding more details for the >>>>>>> result >>>>>>>>>>>>>>> explain(ExplainDetail... extraDetails): String >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // throw exception ??? >>>>>>>>>>>>>>> execute(): TableResult >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> interface TableResult { >>>>>>>>>>>>>>> getTableSchema(): TableSchema >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // avoid custom parsing of an "OK" row in >> programming >>>>>>>>>>>>>>> getResultKind(): ResultKind >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // instead of `get` make it explicit that this is >>>>>>>>>>>>>>> might >>>>>>> be >>>>>>>>>>>>>> triggering >>>>>>>>>>>>>>> an expensive operation >>>>>>>>>>>>>>> collect(): Iterable<Row> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> // for fluent programming >>>>>>>>>>>>>>> print(): Unit >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> enum ResultKind { >>>>>>>>>>>>>>> SUCCESS, // for DDL, DCL and statements with a >> simple >>>>>>> "OK" >>>>>>>>>>>>>>> SUCCESS_WITH_CONTENT, // rows with important >>>>>>>>>>>>>>> content are >>>>>>>>>>>> available >>>>>>>>>>>>>>> (DML, DQL) >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *3. new proposed methods in `Table`* >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> `Table.insertInto()` will be deprecated, and the following >>>>>>> methods >>>>>>>>>> are >>>>>>>>>>>>>>> introduced: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Table.executeInsert(String tablePath): TableResult >>>>>>>>>>>>>>> Table.executeInsert(String tablePath, boolean overwrite): >>>>>>>> TableResult >>>>>>>>>>>>>>> Table.explain(ExplainDetail... details): String >>>>>>>>>>>>>>> Table.execute(): TableResult >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There are two issues need further discussion, one is whether >>>>>>>>>>>>>>> `TableEnvironment.executeSql(String statement): TableResult` >>>>>>> needs >>>>>>>> to >>>>>>>>>>>>>>> support multiline statement (or whether `TableEnvironment` >>>>>>>>>>>>>>> needs >>>>>>> to >>>>>>>>>>>>>> support >>>>>>>>>>>>>>> multiline statement), and another one is whether >>>>>>>>>>>> `StatementSet.execute()` >>>>>>>>>>>>>>> needs to throw exception. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> please refer to the feedback document [2] for the details. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Any suggestions are warmly welcomed! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>> >> https://wiki.apache.org/confluence/pages/viewpage.action?pageId=134745878 >>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>> >> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Godfrey >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>
signature.asc
Description: OpenPGP digital signature