Re: [DISCUSS] Planning Flink 1.12

jincheng sun Wed, 22 Jul 2020 20:20:34 -0700

Hi All,

Thanks for bring-up this discussion, Robert!
Congratulations on becoming the release manager of 1.12， Dian and Robert !


----------
Here is my thoughts of the features for PyFlink in Flink 1.12:

1. Improve the usabilities for PyFlink
    Description:
    Improve the usabilities for PyFlink such as helping users check the
client and cluster environment, optimize error messages, improve the
current API type hint, etc.
    Benefits:
    Improve user experience.

2. PyFlink Table API DSL
    Description:
    Support Python Table API Expression DSL. Expression DSL has been
supported on the Java side(FLIP-55). This task tries to align Python Table
API with Java Table API.
    Benefits:
    Expression DSL is more user friendly than String expressions that users
can rely on IDE smart prompts to write expressions which can facilitate
users and increase development efficiency.


3. Python DataStream API
    Description：
    Support DataStream applications written in Python, including stateless
operations(keyBy, connect, union, map, flatMap, filter, etc) and stateful
operations(RichFunctions, ProcessFunctions,      window, join).
    Benefits:
   1) By adding DataStream API in pyflink, it would provide users more
fine-grained configuration setting API for tasks(such as parallelism and
resource spec) and more complex data processing  operation, which are users
strong demand while SQL and Table API are not supported at the moment.
   2) For areas which have low relies on relation operations, such AI,
transformations like map, flatmap, are more prefered by users than Table
API.

4. Support Pandas UDAF in batch GroupBy aggregation
    Description:
    Support Pandas UDAF in batch GroupBy aggregation of Python Table API &
SQL. Both the input and output of the UDF is pandas.DataFrame.
    Benefits:
   1) Pandas UDAF performs better than row-at-a-time UDAF more than 10x in
certain scenarios
   2) Users could use Pandas/Numpy API in the Python UDAF implementation if
the input/output data type is pandas.DataFrame

5.PyFlink Table API UDAF
   Description:
   Support UDAF for Python Table API.
   Benefits:
   Aggregations(stateful operations) can also be supported in PyFlink.

6. Support running pyflink jobs on kubernetes
    Description:
    Support running pyflink job on kubernetes, including dependency
management and so on just like on yarn and standalone cluster.
    Benefits:
    Kubernetes is a widely used container orchestration framework which has
more flexibility in application developement and deployment.


Welcome any comments and suggestions!

Best,
Jincheng


Dian Fu <dian0511...@gmail.com> 于2020年7月23日周四 上午11:10写道：

> Thanks Robert for bringing up this discussion. This is very important to
> ensure that we have a smooth release process as there are only two months
> left before feature freeze.
>
> It would be good to have a list of the features for 1.12 as soon as
> possible. Welcome any one to post the feature list which you think
> important and want in 1.12.
>
> Regards,
> Dian
>
> > 在 2020年7月23日，上午12:10，Prasanna kumar <prasannakumarram...@gmail.com> 写道：
> >
> > Hi Flink Dev Team,
> >
> > Dynamic AutoScaling Based on the incoming data load would be a great
> feature.
> >
> > We should be able have some rule say If the load increased by 20% , add
> extra resource should be added.
> > Or time based say during these peak hours the pipeline should scale
> automatically by 50%.
> >
> > This will help a lot in cost reduction.
> >
> > EMR cluster provides a similar feature for SPARK based application.
> >
> > Thanks,
> > Prasanna.
> >
> > On Wed, Jul 22, 2020 at 5:40 PM Robert Metzger <rmetz...@apache.org
> <mailto:rmetz...@apache.org>> wrote:
> > Hi all,
> >
> > Now that the 1.11 release is out, it is time to plan for the next major
> > Flink release.
> >
> > Some items:
> >
> >    1.
> >
> >    Dian Fu and me volunteer to be the release managers for Flink 1.12.
> >
> >
> >
> >    1.
> >
> >    Timeline: We propose to stick to our approximate 4 month release
> cycle,
> >    thus the release should be done by late October. Given that there’s a
> >    holiday week in China at the beginning of October, I propose to do the
> >    feature freeze on master by late September.
> >
> >    2.
> >
> >    Collecting features: It would be good to have a rough overview of the
> >    features that will likely be ready to be merged by late September,
> and that
> >    we want in the release.
> >    Based on the discussion, we will update the Roadmap on the Flink
> website
> >    again!
> >
> >
> >
> >    1.
> >
> >    Test instabilities and blockers: I would like to avoid a situation
> where
> >    we have many blocking issues or build instabilities at the time of the
> >    feature freeze. To achieve that, we will try to check every build
> >    instability within a week, to decide if it is a blocker (make sure to
> use
> >    the “test-stability” label for those tickets!)
> >    Blocker issues will need to have somebody assigned (responsible)
> within
> >    a week, and we want to see progress on all blocker issues (downgrade,
> >    resolution, a good plan how to proceed if it is more complicated)
> >
> >    2.
> >
> >    Quality and stability of new features: In order to have a short
> feature
> >    freeze phase, we encourage developers to only merge well-tested and
> >    documented features. In our experience, the feature freeze works best
> if
> >    new features are complete, and the community can focus fully on
> addressing
> >    newly found bugs and voting the release.
> >    By having a smooth release process, the next merge-window for the next
> >    release will come sooner.
> >
> >
> > Let me know what you think about our items, and share which features you
> > want in Flink 1.12.
> >
> > Best,
> >
> > Robert & Dian
>
>

Re: [DISCUSS] Planning Flink 1.12

Reply via email to