Re: [DISCUSS] Planning Flink 1.12

Robert Metzger Mon, 27 Jul 2020 11:42:04 -0700

Hi all,

Thanks a lot for the responses so far. I've put them into this Wiki page:
https://cwiki.apache.org/confluence/display/FLINK/1.12+Release to keep
track of them. Ideally, post JIRA tickets for your feature, then the status
will update automatically in the wiki :)


Please keep posting features here, or add them to the Wiki yourself 🙏

@Prasanna kumar <[email protected]>: Dynamic Auto Scaling is a
feature request the community is well-aware of. Till has posted
"Reactive-scaling mode" as a feature he's working on for the 1.12 release.
This work will introduce the basic building blocks and partial support for
the feature you are requesting.
Proper support for dynamic scaling, while maintaining Flink's high
performance (throughout, low latency) and correctness is a difficult task
that needs a lot of work. It will probably take a little bit of time till
this is fully available.

Cheers,
Robert



On Thu, Jul 23, 2020 at 2:27 PM Till Rohrmann <[email protected]> wrote:

> Thanks for being our release managers for the 1.12 release Dian & Robert!
>
> Here are some features I would like to work on for this release:
>
> # Features
>
> ## Finishing pipelined region scheduling (
> https://issues.apache.org/jira/browse/FLINK-16430)
> With the pipelined region scheduler we want to implement a scheduler which
> can serve streaming as well as batch workloads alike while being able to
> run jobs under constrained resources. The latter is particularly important
> for bounded streaming jobs which, currently, are not well supported.
>
> ## Reactive-scaling mode
> Being able to react to newly available resources and rescaling a running
> job accordingly will make Flink's operation much easier because resources
> can then be controlled by an external tool (e.g. GCP autoscaling, K8s
> horizontal pod scaler, etc.). In this release we want to make a big step
> towards this direction. As a first step we want to support the execution of
> jobs with a parallelism which is lower than the specified parallelism in
> case that Flink lost a TaskManager or could not acquire enough resources.
>
> # Maintenance/Stability
>
> ## JM / TM finished task reconciliation (
> https://issues.apache.org/jira/browse/FLINK-17075)
> This prevents the system from going out of sync if a task state change from
> the TM to the JM is lost.
>
> ## Make metrics services work with Kubernetes deployments (
> https://issues.apache.org/jira/browse/FLINK-11127)
> Invert the direction in which the MetricFetcher connects to the
> MetricQueryFetchers. That way it will no longer be necessary to expose on
> K8s for every TaskManager a port on which the MetricQueryFetcher runs. This
> will then make the deployment of Flink clusters on K8s easier.
>
> ## Handle long-blocking operations during job submission (savepoint
> restore) (https://issues.apache.org/jira/browse/FLINK-16866)
> Submitting a Flink job can involve the interaction with external systems
> (blocking operations). Depending on the job the interactions can take so
> long that it exceeds the submission timeout which reports a failure on the
> client side even though the actual submission succeeded. By decoupling the
> creation of the ExecutionGraph from the job submission, we can make the job
> submission non-blocking which will solve this problem.
>
> ## Make IDs more intuitive to ease debugging (FLIP-118) (
> https://issues.apache.org/jira/browse/FLINK-15679)
> By making the internal Flink IDs compositional or logging how they belong
> together, we can make the debugging of Flink's operations much easier.
>
> Cheers,
> Till
>
>
> On Thu, Jul 23, 2020 at 7:48 AM Canbin Zheng <[email protected]>
> wrote:
>
> > Hi All,
> >
> > Thanks for bring-up this discussion, Robert!
> > Congratulations on becoming the release manager of 1.12， Dian and Robert
> !
> >
> > ----------
> > Here are some of my thoughts of the features for native integration with
> > Kubernetes in Flink 1.12:
> >
> > 1. Support user-specified pod templates
> >     Description:
> >     The current approach of introducing new configuration options for
> each
> > aspect of pod specification a user might wish is becoming unwieldy, we
> have
> > to maintain more and more Flink side Kubernetes configuration options and
> > users have to learn the gap between the declarative model used by
> > Kubernetes and the configuration model used by Flink. It's a great
> > improvement to allow users to specify pod templates as central places for
> > all customization needs for the jobmanager and taskmanager pods.
> >     Benefits:
> >     Users can leverage many of the advanced K8s features that the Flink
> > community does not support explicitly, such as volume mounting, DNS
> > configuration, pod affinity/anti-affinity setting, etc.
> >
> > 2. Support running PyFlink on Kubernetes
> >     Description:
> >     Support running PyFlink on Kubernetes, including session cluster and
> > application cluster.
> >     Benefits:
> >     Running python application in a containerized environment.
> >
> > 3. Support built-in init-Container
> >     Description:
> >     We need a built-in init-Container to help solve dependency management
> > in a containerized environment, especially in the application mode.
> >     Benefits:
> >     Separate the base Flink image from dynamic dependencies.
> >
> > 4. Support accessing secured services via K8s secrets
> >     Description:
> >     Kubernetes Secrets
> > <https://kubernetes.io/docs/concepts/configuration/secret/> can be used
> to
> > provide credentials for a Flink application to access secured services.
> It
> > helps people who want to use a user-specified K8s Secret through an
> > environment variable.
> >     Benefits:
> >     Improve user experience.
> >
> > 5. Support configuring replica of JobManager Deployment in ZooKeeper HA
> > setups
> >     Description:
> >     Make the *replica* of Deployment configurable in the ZooKeeper HA
> > setups.
> >     Benefits:
> >     Achieve faster failover.
> >
> > 6. Support to configure limit for CPU requirement
> >     Description:
> >     To leverage the Kubernetes feature of container request/limit CPU.
> >     Benefits:
> >     Reduce cost.
> >
> > Regards,
> > Canbin Zheng
> >
> > Harold.Miao <[email protected]> 于2020年7月23日周四 下午12:44写道：
> >
> > > I'm excited to hear about this feature,  very, very, very highly
> > encouraged
> > >
> > >
> > > Prasanna kumar <[email protected]> 于2020年7月23日周四
> 上午12:10写道：
> > >
> > > > Hi Flink Dev Team,
> > > >
> > > > Dynamic AutoScaling Based on the incoming data load would be a great
> > > > feature.
> > > >
> > > > We should be able have some rule say If the load increased by 20% ,
> add
> > > > extra resource should be added.
> > > > Or time based say during these peak hours the pipeline should scale
> > > > automatically by 50%.
> > > >
> > > > This will help a lot in cost reduction.
> > > >
> > > > EMR cluster provides a similar feature for SPARK based application.
> > > >
> > > > Thanks,
> > > > Prasanna.
> > > >
> > > > On Wed, Jul 22, 2020 at 5:40 PM Robert Metzger <[email protected]>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Now that the 1.11 release is out, it is time to plan for the next
> > major
> > > > > Flink release.
> > > > >
> > > > > Some items:
> > > > >
> > > > >    1.
> > > > >
> > > > >    Dian Fu and me volunteer to be the release managers for Flink
> > 1.12.
> > > > >
> > > > >
> > > > >
> > > > >    1.
> > > > >
> > > > >    Timeline: We propose to stick to our approximate 4 month release
> > > > cycle,
> > > > >    thus the release should be done by late October. Given that
> > there’s
> > > a
> > > > >    holiday week in China at the beginning of October, I propose to
> do
> > > the
> > > > >    feature freeze on master by late September.
> > > > >
> > > > >    2.
> > > > >
> > > > >    Collecting features: It would be good to have a rough overview
> of
> > > the
> > > > >    features that will likely be ready to be merged by late
> September,
> > > and
> > > > > that
> > > > >    we want in the release.
> > > > >    Based on the discussion, we will update the Roadmap on the Flink
> > > > website
> > > > >    again!
> > > > >
> > > > >
> > > > >
> > > > >    1.
> > > > >
> > > > >    Test instabilities and blockers: I would like to avoid a
> situation
> > > > where
> > > > >    we have many blocking issues or build instabilities at the time
> of
> > > the
> > > > >    feature freeze. To achieve that, we will try to check every
> build
> > > > >    instability within a week, to decide if it is a blocker (make
> sure
> > > to
> > > > > use
> > > > >    the “test-stability” label for those tickets!)
> > > > >    Blocker issues will need to have somebody assigned (responsible)
> > > > within
> > > > >    a week, and we want to see progress on all blocker issues
> > > (downgrade,
> > > > >    resolution, a good plan how to proceed if it is more
> complicated)
> > > > >
> > > > >    2.
> > > > >
> > > > >    Quality and stability of new features: In order to have a short
> > > > feature
> > > > >    freeze phase, we encourage developers to only merge well-tested
> > and
> > > > >    documented features. In our experience, the feature freeze works
> > > best
> > > > if
> > > > >    new features are complete, and the community can focus fully on
> > > > > addressing
> > > > >    newly found bugs and voting the release.
> > > > >    By having a smooth release process, the next merge-window for
> the
> > > next
> > > > >    release will come sooner.
> > > > >
> > > > >
> > > > > Let me know what you think about our items, and share which
> features
> > > you
> > > > > want in Flink 1.12.
> > > > >
> > > > > Best,
> > > > >
> > > > > Robert & Dian
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Best Regards,
> > > Harold Miao
> > >
> >
>

Re: [DISCUSS] Planning Flink 1.12

Reply via email to