Hi Till, Thanks for your kind reply about "Flink as a library". I'm glad to share my ideas and thoughts with you and the community.
As background information, my recent work is to reduce the latency and resource cost of my flink jobs. The jobs are so short and light that even the latency to start MiniClusters or the cost to maintain a remote cluster would be non-negligible and unacceptible. My understanding of "flink as a library" is that a flink job should be run without any cluster and as lightweight as possible, and that's why I expect to solve my problem with this feature. Without one currently available, my plan to implement my own "flink as a library" feature is as follows. First I would try to extract the logical graph of my flink job, which could be StreamGraph, JobGraph or ExecutionGraph. Then I would create one thread for each vertex in the graph. A thread would execute the computation logic according to operators in its vertex and pass data across threads according to edges in the graph. In this way I can execute the flink job within a single process without starting any cluster. In fact the program above can even be executed in a single-threaded way. The only one thread just need to change its context among different vertexes in round-robin fashion according to their topological order. I believe this could allow flink jobs to be executed without flink cluster, because threads can be properly managed by JVM, processes and operating systems. Unlike flink tasks, threads do not need to be allocated by ResourceManager, monitored by TaskManager or dispatched by Dispatcher, so there is no need for flink cluster. I agree that basing "Flink as a library" on reactive mode is a great idea. Every user of the cluster is also service provider, and the general system is elastic to coming or leaving members. My concerns of this idea are I thought "flink as a library" requires running flink without flink cluster, but this idea still contains flink cluster, just that this cluster is internal rather than external. This idea would make one flink application depends on others, and this seems to bring availability as an issue. A Flink application might happen to be JobManager, or it manages tasks whose downstream or upstream tasks are on other nodes. If the application fails in this case, it would cause applications in other processes to fail. I think few library would have such cross-process effects. Still, I think this idea is fascinating and is a promising solution. I am also looking forward to you and any community member to comment on my plan's feasibility, and whether my plan aligns with the general roadmap of "flink as a library". Best regards, Yunfeng Zhou------------------------------------------------------------------ 发件人:Till Rohrmann<trohrm...@apache.org> 日 期:2021年04月15日 16:12:04 收件人:周云峰(云岩)<zhouyunfeng....@alibaba-inc.com> 抄 送:Rohrmann<till.rohrm...@alibaba-inc.com>; 秦江杰<jiangjie...@alibaba-inc.com>; 林东(剑士)<lindong....@alibaba-inc.com> 主 题:Re: "Flink as a libray" progress Hi Yunfeng, With the current release we have made a big step towards "Flink as a library" with the introduction of the reactive mode. Using this mode we can now start Flink TaskManager processes and the resources should be automatically used. This feature is in a MVP state which means that it still has a couple of rough edges and needs to be further improved but the fundamentals are there. What's missing for the "Flink as a library" feature is to integrate the reactive mode with how a user would use Flink as a library. What I would expect from "XYZ as a library" is that I only need to add a dependency to my program, do some API calls and then can start my binary as I would start any other binary. What this means for Flink is that if you start your binary, then it should join a cluster (probably the user has to configure something to let processes find each other), figure out which process is the leader (the leader becomes the JobManager) and the other processes become the TaskManagers to execute the workload. That way, the user can simply start and stop processes with the same Flink program and these processes execute the specified job. Of course, one would also have to check that the specified Flink job is the same. Ideally this happens automatically and processes which execute the same Flink job find each other and form a cluster. I don't think that there is already a full design for this feature in the community. There are only ideas how such a feature could behave. Happy to hear your ideas for this feature. Ideally you share it with the community. Cheers, Till On Thu, Apr 15, 2021 at 4:41 AM 周云峰(云岩) <zhouyunfeng....@alibaba-inc.com> wrote: Hi Till, My name is Yunfeng Zhou and I joined Alibaba Cloud's Real-time Computing Eco Team in Shanghai about a month ago. Nice to meet you. I learned from flink's roadmap that we are trying to achieve "Flink as a library" and heard that you have been working on this issue. I want to know more about its progress, as my current work also involves running flink job as function instead of starting a flink cluster. When will this feature be available for Flink, and may I have more information about its general structure and API document? Best regards, Yunfeng Zhou