Hi Fabian,
Yes, Timers is not only the difference between Table and DataStream, but
also the difference between DataStream and DataSet. We need to unify the
batch and Stream in Table, so the difference about timers needs to be
considered in depth. :)
Thanks, Jincheng
Fabian Hueske 于2018年11月15日
Thanks Jincheng,
That makes sense to me.
Another differentiation of Table API and DataStream API would be the access
to the timer service.
The DataStream API can register and act on timers while the Table API would
not have this feature.
Best, Fabian
Am Mi., 14. Nov. 2018 um 02:02 Uhr schrieb ji
Hi Piotrek, Fabian:
I am very glad to see your reply. Thank you very much Piotrek for asking
very good questions. I will share my opinion:
- The Enhancing TableAPI that I proposed is proposed for user
friendliness. After enhancement, it will maintain the characteristics of
TableAPI&SQL,
Yes, that is my understanding as well.
Manual time management would be another difference.
Something still to be discussed would be whether (or to what extent) it
would be possible to define the physical execution plan with hints or
methods like partitionByHash and sortPartition.
Best, Fabian
Am
Hi,
> This thread is meant to enhancing the functionalities of TableAPI. I don't
> think that anyone is suggesting either reducing the effort in SQL or
> DataStream. So let's focus on how we can enhance TableAPI.
I wasn’t thinking about that. As I said before, I was rising a question, what
Table
Hi Piotr:
I want to clarify one thing first: I think that we will keep the
interoperability between TableAPI and DataStream in any case. So user can
switch between the two whenever needed. Given that, it would still be very
helpful that users can use one API to achieve most of what they do.
Curren
Hi all,
Thanks for the feedback. I enjoyed the discussions, especially the ones
between Fabian and Xiaowei. I think it well revealed the motivations and
design pros/cons behind this proposal. Enhancing tableAPI will not affect
and limit the improvements on Flink SQL (as well as DataStream). Actual
Hi all,
Thank you for your replies and comments.
I have similar consideration like Piotrek. My opinion is that two APIs are
enough for Flink, a declarative one (SQL) and one imperative one
(DataStream). From my perspective, most of users prefer SQL at most time
and turn to Data Stream when the l
Hi,
What is our intended division/border between Table API and DataSet or
DataStream? If we want Table API to drift away from SQL that would be a valid
question.
> Another distinguishing feature of DataStream API is that users get direct
> access to state/statebackend which we intensionally avo
Hi,
An analysis of orthogonal functions would be great!
There is certainly some overlap in the functions provided by the DataSet
API.
In the past, I found that having low-level functions helped a lot to
efficiently implement complex logic.
Without partitionByHash, sortPartition, sort, mapPartitio
Hi Fabian,
Thank you for your deep thoughts in this regard, I think most of questions
you had mentioned are very worthy of in-depth discussion! I want share
thoughts about following questions:
1. Do we need move all DataSet API functionality into the Table API?
I think most of dataset functionalit
Hi Fabian,
I totally agree with you that we should incrementally improve TableAPI. We
don't suggest that we do anything drastic such as replacing DataSet API
yet. We should see how much we can achieve by extending TableAPI cleanly.
By then, we should see if there are any natural boundaries on how
Thanks for the replies Xiaowei and others!
You are right, I did not consider the batch optimization that would be
missing if the DataSet API would be ported to extend the DataStream API.
By extending the scope of the Table API, we can gain a holistic logical &
physical optimization which would be
Hi Jark,
Glad to see your feedback!
That's Correct, The proposal is aiming to extend the functionality for
Table API! I like add "drop" to fit the use case you mentioned. Not only
that, if a 100-columns Table. and our UDF needs these 100 columns, we don't
want to define the eval as eval(column0...c
Hi jingcheng,
Thanks for your proposal. I think it is a helpful enhancement for TableAPI
which is a solid step forward for TableAPI.
It doesn't weaken SQL or DataStream, because the conversion between
DataStream and Table still works.
People with advanced cases (e.g. complex and fine-grained state
Hi Rong Rong,
Sorry for the late reply, And thanks for your feedback! We will continue
to add more convenience features to the TableAPI, such as map, flatmap,
agg, flatagg, iteration etc. And I am very happy that you are interested on
this proposal. Due to this is a long-term continuous work, we
Hi Xinggang,
Thanks for the comments. Please see the responses inline below.
On Tue, Nov 6, 2018 at 11:28 AM SHI Xiaogang wrote:
> Hi all,
>
> I think it's good to enhance the functionality and productivity of Table
> API, but still I think SQL + DataStream is a better choice from user
> experi
Hi Xiaogang,
Thanks for your feedback, I will share my thoughts here:
First, enhancing TableAPI does not mean weakening SQL. We also need to
enhance the functionality of SQL, such as @Xuefu's ongoing integration of
the hive SQL ecosystem.
In addition,SQL and TableAPI are two different API forms
Hi all,
I think it's good to enhance the functionality and productivity of Table
API, but still I think SQL + DataStream is a better choice from user
experience
1. The unification of batch and stream processing is very attractive, and
many our users are moving their batch-processing applications t
Hi Fabian, these are great questions! I have some quick thoughts on some of
these.
Optimization opportunities: I think that you are right UDFs are more like
blackboxes today. However this can change if we let user develop UDFs
symbolically in the future (i.e., Flink will look inside the UDF code,
Hi Jincheng,
Thanks for this interesting proposal.
I like that we can push this effort forward in a very fine-grained manner,
i.e., incrementally adding more APIs to the Table API.
However, I also have a few questions / concerns.
Today, the Table API is tightly integrated with the DataSet and Dat
Hi Jincheng,
Thank you for the proposal! I think being able to define a process /
co-process function in table API definitely opens up a whole new level of
applications using a unified API.
In addition, as Tzu-Li and Hequn have mentioned, the benefit of
optimization layer of Table API will alread
Hi tison,
Thanks a lot for your feedback!
I am very happy to see that community contributors agree to enhanced the
TableAPI. This work is a long-term continuous work, we will push it in
stages, we will soon complete the enhanced list of the first phase, we can
go deep discussion in google doc. t
Hi jingchengm
Thanks a lot for your proposal! I find it is a good start point for
internal optimization works and help Flink to be more
user-friendly.
AFAIK, DataStream is the most popular API currently that Flink
users should describe their logic with detailed logic.
>From a more internal view t
Hi Hequn,
Thanks for your feedback! And also thanks for our offline discussion!
You are right, unification of batch and streaming is very important for
flink API.
We will provide more detailed design later, Please let me know if you have
further thoughts or feedback.
Thanks,
Jincheng
Hequn Cheng
Hi, Jiangjie,
Thanks a lot for your feedback. And also thanks for our offline discussion!
Yes, your right! The Row-based APIs which you mentioned are very friendly
to flink user!
In order to follow the concept of the traditional database, perhaps we
named the corresponding function RowValued/TabeVa
Hi Jincheng,
Thanks a lot for your proposal. It is very encouraging!
As we all know, SQL is a widely used language. It follows standards, is a
descriptive language, and is easy to use. A powerful feature of SQL is that
it supports optimization. Users only need to care about the logic of the
progr
Hi Aljoscha,
Glad that you like the proposal. We have completed the prototype of most
new proposed functionalities. Once collect the feedback from community, we
will come up with a concrete FLIP/design doc.
Regards,
Shaoxuan
On Thu, Nov 1, 2018 at 8:12 PM Aljoscha Krettek wrote:
> Hi Jincheng,
Thanks for the proposal, Jincheng.
This makes a lot of sense. As a programming interface, Table API is
especially attractive because it supports both batch and stream. However,
the relational-only API often forces users to shoehorn their logic into a
bunch of user defined functions. Introducing so
Hi, Timo,
I am very grateful for your feedback, and I am very excited when I hear
that you also consider adding a process function to the TableAPI.
I agree that add support for the Process Function on the Table API, which
is actually part of my proposal Enhancing the functionality of Table API.
In
Yes, that makes sense!
> On 1. Nov 2018, at 15:51, jincheng sun wrote:
>
> Hi, Aljoscha,
>
> Thanks for your feedback and suggestions. I think your are right, the
> detailed design/FLIP is very necessary. Before the detailed design or open
> a FLIP, I would like to hear the community's views on
Hi, Aljoscha,
Thanks for your feedback and suggestions. I think your are right, the
detailed design/FLIP is very necessary. Before the detailed design or open
a FLIP, I would like to hear the community's views on Enhancing the
functionality and productivity of Table API, to ensure that it worth t
Hi Jincheng,
I was also thinking about introducing a process function for the Table
API several times. This would allow to define more complex logic (custom
windows, timers, etc.) embedded into a relational API with schema
awareness and optimization around the black box. Of course this would
Hi Jincheng,
these points sound very good! Are there any concrete proposals for changes? For
example a FLIP/design document?
See here for FLIPs:
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
Best,
Aljoscha
> On 1. Nov 2018, at 12:51, jincheng sun wrote:
>
>
*I am sorry for the formatting of the email content. I reformat
the **content** as follows---*
*Hi ALL,*
With the continuous efforts from the community, the Flink system has been
continuously improved, which has attracted more and more users. Flink SQL
is a canonical, widely used
Hi all,
With the continuous efforts from the community, the Flink system has been
continuously improved, which has attracted more and more users. Flink SQL
is a canonical, widely used relational query language. However, there are
still some scenarios where Flink SQL failed to meet user needs in te
36 matches
Mail list logo