Re: "Flink as a libray" progress

周云峰(云岩) Thu, 15 Apr 2021 05:17:23 -0700

Hi Till,

Thanks for your kind reply about "Flink as a library". I'm glad to share my 
ideas and thoughts with you and the community.

As background information, my recent work is to reduce the latency and resource
cost of my flink jobs. The jobs are so short and light that even the latency to
start MiniClusters or the cost to maintain a remote cluster would be
non-negligible and unacceptible. My understanding of "flink as a library" is
that a flink job should be run without any cluster and as lightweight as
possible, and that's why I expect to solve my problem with this feature.

Without one currently available, my plan to implement my own "flink as a
library" feature is as follows. First I would try to extract the logical graph
of my flink job, which could be StreamGraph, JobGraph or ExecutionGraph. Then I
would create one thread for each vertex in the graph. A thread would execute
the computation logic according to operators in its vertex and pass data across
threads according to edges in the graph. In this way I can execute the flink
job within a single process without starting any cluster.

In fact the program above can even be executed in a single-threaded way. The
only one thread just need to change its context among different vertexes in
round-robin fashion according to their topological order.

I believe this could allow flink jobs to be executed without flink cluster,
because threads can be properly managed by JVM, processes and operating
systems. Unlike flink tasks, threads do not need to be allocated by
ResourceManager, monitored by TaskManager or dispatched by Dispatcher, so there
is no need for flink cluster.

I agree that basing "Flink as a library" on reactive mode is a great idea.
Every user of the cluster is also service provider, and the general system is
elastic to coming or leaving members. My concerns of this idea are
I thought "flink as a library" requires running flink without flink cluster,
but this idea still contains flink cluster, just that this cluster is internal
rather than external.
This idea would make one flink application depends on others, and this seems to
bring availability as an issue. A Flink application might happen to be
JobManager, or it manages tasks whose downstream or upstream tasks are on other
nodes. If the application fails in this case, it would cause applications in
other processes to fail. I think few library would have such cross-process
effects.

Still, I think this idea is fascinating and is a promising solution. I am also
looking forward to you and any community member to comment on my plan's
feasibility, and whether my plan aligns with the general roadmap of "flink as a
library".

Best regards,
Yunfeng Zhou------------------------------------------------------------------
发件人：Till Rohrmann<trohrm...@apache.org>
日 期：2021年04月15日 16:12:04
收件人：周云峰(云岩)<zhouyunfeng....@alibaba-inc.com>
抄 送：Rohrmann<till.rohrm...@alibaba-inc.com>; 秦江杰<jiangjie...@alibaba-inc.com>;
林东(剑士)<lindong....@alibaba-inc.com>
主 题：Re: "Flink as a libray" progress

Hi Yunfeng,

With the current release we have made a big step towards "Flink as a library"
with the introduction of the reactive mode. Using this mode we can now start
Flink TaskManager processes and the resources should be automatically used.
This feature is in a MVP state which means that it still has a couple of rough
edges and needs to be further improved but the fundamentals are there.

What's missing for the "Flink as a library" feature is to integrate the
reactive mode with how a user would use Flink as a library. What I would expect
from "XYZ as a library" is that I only need to add a dependency to my program,
do some API calls and then can start my binary as I would start any other
binary. What this means for Flink is that if you start your binary, then it
should join a cluster (probably the user has to configure something to let
processes find each other), figure out which process is the leader (the leader
becomes the JobManager) and the other processes become the TaskManagers to
execute the workload. That way, the user can simply start and stop processes
with the same Flink program and these processes execute the specified job. Of
course, one would also have to check that the specified Flink job is the same.
Ideally this happens automatically and processes which execute the same Flink
job find each other and form a cluster.

I don't think that there is already a full design for this feature in the
community. There are only ideas how such a feature could behave. Happy to hear
your ideas for this feature. Ideally you share it with the community.

Cheers,
Till
On Thu, Apr 15, 2021 at 4:41 AM 周云峰(云岩) <zhouyunfeng....@alibaba-inc.com> wrote:

Hi Till,

My name is Yunfeng Zhou and I joined Alibaba Cloud's Real-time Computing Eco
Team in Shanghai about a month ago. Nice to meet you.

I learned from flink's roadmap that we are trying to achieve "Flink as a
library" and heard that you have been working on this issue. I want to know
more about its progress, as my current work also involves running flink job as
function instead of starting a flink cluster. When will this feature be
available for Flink, and may I have more information about its general
structure and API document?

Best regards,
Yunfeng Zhou

Re: "Flink as a libray" progress

Reply via email to