Re: [DISCUSS] Enhancing the functionality and productivity of Table API

Aljoscha Krettek Thu, 01 Nov 2018 07:56:07 -0700
Yes, that makes sense!

> On 1. Nov 2018, at 15:51, jincheng sun <sunjincheng...@gmail.com> wrote:
> 
> Hi, Aljoscha,
> 
> Thanks for your feedback and suggestions. I think your are right, the
> detailed design/FLIP is very necessary. Before the detailed design or open
> a FLIP, I would like to hear the community's views on Enhancing the
> functionality and productivity of Table API,  to ensure that it worth to
> effort. If most community members agree with my proposal, I will list the
> changes and discuss with all community members. Is that make sense to you?
> 
> Thanks,
> Jincheng
> 
> Aljoscha Krettek <aljos...@apache.org> 于2018年11月1日周四 下午8:12写道：
> 
>> Hi Jincheng,
>> 
>> these points sound very good! Are there any concrete proposals for
>> changes? For example a FLIP/design document?
>> 
>> See here for FLIPs:
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>> 
>> Best,
>> Aljoscha
>> 
>>> On 1. Nov 2018, at 12:51, jincheng sun <sunjincheng...@gmail.com> wrote:
>>> 
>>> *--------I am sorry for the formatting of the email content. I reformat
>>> the **content** as follows-----------*
>>> 
>>> *Hi ALL,*
>>> 
>>> With the continuous efforts from the community, the Flink system has been
>>> continuously improved, which has attracted more and more users. Flink SQL
>>> is a canonical, widely used relational query language. However, there are
>>> still some scenarios where Flink SQL failed to meet user needs in terms
>> of
>>> functionality and ease of use, such as:
>>> 
>>> *1. In terms of functionality*
>>>   Iteration, user-defined window, user-defined join, user-defined
>>> GroupReduce, etc. Users cannot express them with SQL;
>>> 
>>> *2. In terms of ease of use*
>>> 
>>>  - Map - e.g. “dataStream.map(mapFun)”. Although “table.select(udf1(),
>>>  udf2(), udf3()....)” can be used to accomplish the same function.,
>> with a
>>>  map() function returning 100 columns, one has to define or call 100
>> UDFs
>>>  when using SQL, which is quite involved.
>>>  - FlatMap -  e.g. “dataStrem.flatmap(flatMapFun)”. Similarly, it can be
>>>  implemented with “table.join(udtf).select()”. However, it is obvious
>> that
>>>  dataStream is easier to use than SQL.
>>> 
>>> Due to the above two reasons, some users have to use the DataStream API
>> or
>>> the DataSet API. But when they do that, they lose the unification of
>> batch
>>> and streaming. They will also lose the sophisticated optimizations such
>> as
>>> codegen, aggregate join transpose and multi-stage agg from Flink SQL.
>>> 
>>> We believe that enhancing the functionality and productivity is vital for
>>> the successful adoption of Table API. To this end,  Table API still
>>> requires more efforts from every contributor in the community. We see
>> great
>>> opportunity in improving our user’s experience from this work. Any
>> feedback
>>> is welcome.
>>> 
>>> Regards,
>>> 
>>> Jincheng
>>> 
>>> jincheng sun <sunjincheng...@gmail.com> 于2018年11月1日周四 下午5:07写道：
>>> 
>>>> Hi all,
>>>> 
>>>> With the continuous efforts from the community, the Flink system has
>> been
>>>> continuously improved, which has attracted more and more users. Flink
>> SQL
>>>> is a canonical, widely used relational query language. However, there
>> are
>>>> still some scenarios where Flink SQL failed to meet user needs in terms
>> of
>>>> functionality and ease of use, such as:
>>>> 
>>>> 
>>>>  -
>>>> 
>>>>  In terms of functionality
>>>> 
>>>> Iteration, user-defined window, user-defined join, user-defined
>>>> GroupReduce, etc. Users cannot express them with SQL;
>>>> 
>>>>  -
>>>> 
>>>>  In terms of ease of use
>>>>  -
>>>> 
>>>>     Map - e.g. “dataStream.map(mapFun)”. Although “table.select(udf1(),
>>>>     udf2(), udf3()....)” can be used to accomplish the same function.,
>> with a
>>>>     map() function returning 100 columns, one has to define or call
>> 100 UDFs
>>>>     when using SQL, which is quite involved.
>>>>     -
>>>> 
>>>>     FlatMap -  e.g. “dataStrem.flatmap(flatMapFun)”. Similarly, it can
>>>>     be implemented with “table.join(udtf).select()”. However, it is
>> obvious
>>>>     that datastream is easier to use than SQL.
>>>> 
>>>> 
>>>> Due to the above two reasons, some users have to use the DataStream API
>> or
>>>> the DataSet API. But when they do that, they lose the unification of
>> batch
>>>> and streaming. They will also lose the sophisticated optimizations such
>> as
>>>> codegen, aggregate join transpose  and multi-stage agg from Flink SQL.
>>>> 
>>>> We believe that enhancing the functionality and productivity is vital
>> for
>>>> the successful adoption of Table API. To this end,  Table API still
>>>> requires more efforts from every contributor in the community. We see
>> great
>>>> opportunity in improving our user’s experience from this work. Any
>> feedback
>>>> is welcome.
>>>> 
>>>> Regards,
>>>> 
>>>> Jincheng
>>>> 
>>>> 
>> 
>>
Re: [DISCUSS] Enhancing the functionality and productivity of Table API

Reply via email to