Hi Jincheng, Thank you for the proposal! I think being able to define a process / co-process function in table API definitely opens up a whole new level of applications using a unified API.
In addition, as Tzu-Li and Hequn have mentioned, the benefit of optimization layer of Table API will already bring in additional benefit over directly programming on top of DataStream/DataSet API. I am very interested an looking forward to seeing the support for more complex use cases, especially iterations. It will enable table API to define much broader, event-driven use cases such as real-time ML prediction/training. As Timo mentioned, This will make Table API diverge from the SQL API. But as from my experience Table API was always giving me the impression to be a more sophisticated, syntactic-aware way to express relational operations. Looking forward to further discussion and collaborations on the FLIP doc. -- Rong On Sun, Nov 4, 2018 at 5:22 PM jincheng sun <sunjincheng...@gmail.com> wrote: > Hi tison, > > Thanks a lot for your feedback! > I am very happy to see that community contributors agree to enhanced the > TableAPI. This work is a long-term continuous work, we will push it in > stages, we will soon complete the enhanced list of the first phase, we can > go deep discussion in google doc. thanks again for joining on the very > important discussion of the Flink Table API. > > Thanks, > Jincheng > > Tzu-Li Chen <wander4...@gmail.com> 于2018年11月2日周五 下午1:49写道: > > > Hi jingchengm > > > > Thanks a lot for your proposal! I find it is a good start point for > > internal optimization works and help Flink to be more > > user-friendly. > > > > AFAIK, DataStream is the most popular API currently that Flink > > users should describe their logic with detailed logic. > > From a more internal view the conversion from DataStream to > > JobGraph is quite mechanically and hard to be optimized. So when > > users program with DataStream, they have to learn more internals > > and spend a lot of time to tune for performance. > > With your proposal, we provide enhanced functionality of Table API, > > so that users can describe their job easily on Table aspect. This gives > > an opportunity to Flink developers to introduce an optimize phase > > while transforming user program(described by Table API) to internal > > representation. > > > > Given a user who want to start using Flink with simple ETL, pipelining > > or analytics, he would find it is most naturally described by SQL/Table > > API. Further, as mentioned by @hequn, > > > > SQL is a widely used language. It follows standards, is a > > > descriptive language, and is easy to use > > > > > > thus we could expect with the enhancement of SQL/Table API, Flink > > becomes more friendly to users. > > > > Looking forward to the design doc/FLIP! > > > > Best, > > tison. > > > > > > jincheng sun <sunjincheng...@gmail.com> 于2018年11月2日周五 上午11:46写道: > > > > > Hi Hequn, > > > Thanks for your feedback! And also thanks for our offline discussion! > > > You are right, unification of batch and streaming is very important for > > > flink API. > > > We will provide more detailed design later, Please let me know if you > > have > > > further thoughts or feedback. > > > > > > Thanks, > > > Jincheng > > > > > > Hequn Cheng <chenghe...@gmail.com> 于2018年11月2日周五 上午10:02写道: > > > > > > > Hi Jincheng, > > > > > > > > Thanks a lot for your proposal. It is very encouraging! > > > > > > > > As we all know, SQL is a widely used language. It follows standards, > > is a > > > > descriptive language, and is easy to use. A powerful feature of SQL > is > > > that > > > > it supports optimization. Users only need to care about the logic of > > the > > > > program. The underlying optimizer will help users optimize the > > > performance > > > > of the program. However, in terms of functionality and ease of use, > in > > > some > > > > scenarios sql will be limited, as described in Jincheng's proposal. > > > > > > > > Correspondingly, the DataStream/DataSet api can provide powerful > > > > functionalities. Users can write ProcessFunction/CoProcessFunction > and > > > get > > > > the timer. Compared with SQL, it provides more functionalities and > > > > flexibilities. However, it does not support optimization like SQL. > > > > Meanwhile, DataStream/DataSet api has not been unified which means, > for > > > the > > > > same logic, users need to write a job for each stream and batch. > > > > > > > > With TableApi, I think we can combine the advantages of both. Users > can > > > > easily write relational operations and enjoy optimization. At the > same > > > > time, it supports more functionality and ease of use. Looking forward > > to > > > > the detailed design/FLIP. > > > > > > > > Best, > > > > Hequn > > > > > > > > On Fri, Nov 2, 2018 at 9:48 AM Shaoxuan Wang <wshaox...@gmail.com> > > > wrote: > > > > > > > > > Hi Aljoscha, > > > > > Glad that you like the proposal. We have completed the prototype of > > > most > > > > > new proposed functionalities. Once collect the feedback from > > community, > > > > we > > > > > will come up with a concrete FLIP/design doc. > > > > > > > > > > Regards, > > > > > Shaoxuan > > > > > > > > > > > > > > > On Thu, Nov 1, 2018 at 8:12 PM Aljoscha Krettek < > aljos...@apache.org > > > > > > > > wrote: > > > > > > > > > > > Hi Jincheng, > > > > > > > > > > > > these points sound very good! Are there any concrete proposals > for > > > > > > changes? For example a FLIP/design document? > > > > > > > > > > > > See here for FLIPs: > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > > > > > > > > > > > Best, > > > > > > Aljoscha > > > > > > > > > > > > > On 1. Nov 2018, at 12:51, jincheng sun < > sunjincheng...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > *--------I am sorry for the formatting of the email content. I > > > > reformat > > > > > > > the **content** as follows-----------* > > > > > > > > > > > > > > *Hi ALL,* > > > > > > > > > > > > > > With the continuous efforts from the community, the Flink > system > > > has > > > > > been > > > > > > > continuously improved, which has attracted more and more users. > > > Flink > > > > > SQL > > > > > > > is a canonical, widely used relational query language. However, > > > there > > > > > are > > > > > > > still some scenarios where Flink SQL failed to meet user needs > in > > > > terms > > > > > > of > > > > > > > functionality and ease of use, such as: > > > > > > > > > > > > > > *1. In terms of functionality* > > > > > > > Iteration, user-defined window, user-defined join, > > user-defined > > > > > > > GroupReduce, etc. Users cannot express them with SQL; > > > > > > > > > > > > > > *2. In terms of ease of use* > > > > > > > > > > > > > > - Map - e.g. “dataStream.map(mapFun)”. Although > > > > “table.select(udf1(), > > > > > > > udf2(), udf3()....)” can be used to accomplish the same > > > function., > > > > > > with a > > > > > > > map() function returning 100 columns, one has to define or > call > > > 100 > > > > > > UDFs > > > > > > > when using SQL, which is quite involved. > > > > > > > - FlatMap - e.g. “dataStrem.flatmap(flatMapFun)”. Similarly, > > it > > > > can > > > > > be > > > > > > > implemented with “table.join(udtf).select()”. However, it is > > > > obvious > > > > > > that > > > > > > > dataStream is easier to use than SQL. > > > > > > > > > > > > > > Due to the above two reasons, some users have to use the > > DataStream > > > > API > > > > > > or > > > > > > > the DataSet API. But when they do that, they lose the > unification > > > of > > > > > > batch > > > > > > > and streaming. They will also lose the sophisticated > > optimizations > > > > such > > > > > > as > > > > > > > codegen, aggregate join transpose and multi-stage agg from > Flink > > > SQL. > > > > > > > > > > > > > > We believe that enhancing the functionality and productivity is > > > vital > > > > > for > > > > > > > the successful adoption of Table API. To this end, Table API > > still > > > > > > > requires more efforts from every contributor in the community. > We > > > see > > > > > > great > > > > > > > opportunity in improving our user’s experience from this work. > > Any > > > > > > feedback > > > > > > > is welcome. > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > Jincheng > > > > > > > > > > > > > > jincheng sun <sunjincheng...@gmail.com> 于2018年11月1日周四 > 下午5:07写道: > > > > > > > > > > > > > >> Hi all, > > > > > > >> > > > > > > >> With the continuous efforts from the community, the Flink > system > > > has > > > > > > been > > > > > > >> continuously improved, which has attracted more and more > users. > > > > Flink > > > > > > SQL > > > > > > >> is a canonical, widely used relational query language. > However, > > > > there > > > > > > are > > > > > > >> still some scenarios where Flink SQL failed to meet user needs > > in > > > > > terms > > > > > > of > > > > > > >> functionality and ease of use, such as: > > > > > > >> > > > > > > >> > > > > > > >> - > > > > > > >> > > > > > > >> In terms of functionality > > > > > > >> > > > > > > >> Iteration, user-defined window, user-defined join, > user-defined > > > > > > >> GroupReduce, etc. Users cannot express them with SQL; > > > > > > >> > > > > > > >> - > > > > > > >> > > > > > > >> In terms of ease of use > > > > > > >> - > > > > > > >> > > > > > > >> Map - e.g. “dataStream.map(mapFun)”. Although > > > > > “table.select(udf1(), > > > > > > >> udf2(), udf3()....)” can be used to accomplish the same > > > > > function., > > > > > > with a > > > > > > >> map() function returning 100 columns, one has to define > or > > > call > > > > > > 100 UDFs > > > > > > >> when using SQL, which is quite involved. > > > > > > >> - > > > > > > >> > > > > > > >> FlatMap - e.g. “dataStrem.flatmap(flatMapFun)”. > Similarly, > > > it > > > > > can > > > > > > >> be implemented with “table.join(udtf).select()”. However, > > it > > > is > > > > > > obvious > > > > > > >> that datastream is easier to use than SQL. > > > > > > >> > > > > > > >> > > > > > > >> Due to the above two reasons, some users have to use the > > > DataStream > > > > > API > > > > > > or > > > > > > >> the DataSet API. But when they do that, they lose the > > unification > > > of > > > > > > batch > > > > > > >> and streaming. They will also lose the sophisticated > > optimizations > > > > > such > > > > > > as > > > > > > >> codegen, aggregate join transpose and multi-stage agg from > > Flink > > > > SQL. > > > > > > >> > > > > > > >> We believe that enhancing the functionality and productivity > is > > > > vital > > > > > > for > > > > > > >> the successful adoption of Table API. To this end, Table API > > > still > > > > > > >> requires more efforts from every contributor in the community. > > We > > > > see > > > > > > great > > > > > > >> opportunity in improving our user’s experience from this work. > > Any > > > > > > feedback > > > > > > >> is welcome. > > > > > > >> > > > > > > >> Regards, > > > > > > >> > > > > > > >> Jincheng > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > >