Hi Till,

Thanks for your kind reply about "Flink as a library". I'm glad to share my 
ideas and thoughts with you and the community.

As background information, my recent work is to reduce the latency and resource 
cost of my flink jobs. The jobs are so short and light that even the latency to 
start MiniClusters or the cost to maintain a remote cluster would be 
non-negligible and unacceptible. My understanding of "flink as a library" is 
that a flink job should be run without any cluster and as lightweight as 
possible, and that's why I expect to solve my problem with this feature. 

Without one currently available, my plan to implement my own "flink as a 
library" feature is as follows. First I would try to extract the logical graph 
of my flink job, which could be StreamGraph, JobGraph or ExecutionGraph. Then I 
would create one thread for each vertex in the graph. A thread would execute 
the computation logic according to operators in its vertex and pass data across 
threads according to edges in the graph. In this way I can execute the flink 
job within a single process without starting any cluster. 

In fact the program above can even be executed in a single-threaded way. The 
only one thread just need to change its context among different vertexes in 
round-robin fashion according to their topological order.

I believe this could allow flink jobs to be executed without flink cluster, 
because threads can be properly managed by JVM, processes and operating 
systems. Unlike flink tasks, threads do not need to be allocated by 
ResourceManager, monitored by TaskManager or dispatched by Dispatcher, so there 
is no need for flink cluster.

I agree that basing "Flink as a library" on reactive mode is a great idea. 
Every user of the cluster is also service provider, and the general system is 
elastic to coming or leaving members. My concerns of this idea are
I thought "flink as a library" requires running flink without flink cluster, 
but this idea still contains flink cluster, just that this cluster is internal 
rather than external.
This idea would make one flink application depends on others, and this seems to 
bring availability as an issue. A Flink application might happen to be 
JobManager, or it manages tasks whose downstream or upstream tasks are on other 
nodes. If the application fails in this case, it would cause applications in 
other processes to fail. I think few library would have such cross-process 
effects.

Still, I think this idea is fascinating and is a promising solution. I am also 
looking forward to you and any community member to comment on my plan's 
feasibility, and whether my plan aligns with the general roadmap of "flink as a 
library".

Best regards,
Yunfeng Zhou------------------------------------------------------------------
发件人:Till Rohrmann<trohrm...@apache.org>
日 期:2021年04月15日 16:12:04
收件人:周云峰(云岩)<zhouyunfeng....@alibaba-inc.com>
抄 送:Rohrmann<till.rohrm...@alibaba-inc.com>; 秦江杰<jiangjie...@alibaba-inc.com>; 
林东(剑士)<lindong....@alibaba-inc.com>
主 题:Re: "Flink as a libray" progress

Hi Yunfeng,

With the current release we have made a big step towards "Flink as a library" 
with the introduction of the reactive mode. Using this mode we can now start 
Flink TaskManager processes and the resources should be automatically used. 
This feature is in a MVP state which means that it still has a couple of rough 
edges and needs to be further improved but the fundamentals are there.

What's missing for the "Flink as a library" feature is to integrate the 
reactive mode with how a user would use Flink as a library. What I would expect 
from "XYZ as a library" is that I only need to add a dependency to my program, 
do some API calls and then can start my binary as I would start any other 
binary. What this means for Flink is that if you start your binary, then it 
should join a cluster (probably the user has to configure something to let 
processes find each other), figure out which process is the leader (the leader 
becomes the JobManager) and the other processes become the TaskManagers to 
execute the workload. That way, the user can simply start and stop processes 
with the same Flink program and these processes execute the specified job. Of 
course, one would also have to check that the specified Flink job is the same. 
Ideally this happens automatically and processes which execute the same Flink 
job find each other and form a cluster.

I don't think that there is already a full design for this feature in the 
community. There are only ideas how such a feature could behave. Happy to hear 
your ideas for this feature. Ideally you share it with the community.

Cheers,
Till
On Thu, Apr 15, 2021 at 4:41 AM 周云峰(云岩) <zhouyunfeng....@alibaba-inc.com> wrote:

Hi Till,

My name is Yunfeng Zhou and I joined Alibaba Cloud's Real-time Computing Eco 
Team in Shanghai about a month ago. Nice to meet you.

I learned from flink's roadmap that we are trying to achieve "Flink as a 
library" and heard that you have been working on this issue. I want to know 
more about its progress, as my current work also involves running flink job as 
function instead of starting a flink cluster. When will this feature be 
available for Flink, and may I have more information about its general 
structure and API document?

Best regards,
Yunfeng Zhou

Reply via email to