Hi Yunfeng, sorry for the late reply. I think what you have in mind is a bit different from what I described. If you want to support running a Flink job within a single process, then you could use Flink's MiniCluster. The MiniCluster should give you what you are looking for.
Concerning your comments: 1. You are right that there will still be a "cluster" involved in the execution of a Flink as a library program. However, it will no longer be visible to the user. From the users perspective, he would simply start a process and the rest is done in the background by Flink. 2. It is true that a failing process will affect the overall computation and it will probably trigger a failover. The idea is then that the cluster is self-healing. This means that a new process will become the JobManager if the current JobManager fails. If a TaskManager disappears, then Flink will execute the job with a lower parallelism. Cheers, Till On Thu, Apr 15, 2021 at 2:17 PM 周云峰(云岩) <zhouyunfeng....@alibaba-inc.com> wrote: > Hi Till, > > > Thanks for your kind reply about "Flink as a library". I'm glad to share > my ideas and thoughts with you and the community. > > > As background information, my recent work is to reduce the latency and > resource cost of my flink jobs. The jobs are so short and light that even > the latency to start MiniClusters or the cost to maintain a remote cluster > would be non-negligible and unacceptible. My understanding of "flink as a > library" is that a flink job should be run without any cluster and as > lightweight as possible, and that's why I expect to solve my problem with > this feature. > > > Without one currently available, my plan to implement my own "flink as a > library" feature is as follows. First I would try to extract the logical > graph of my flink job, which could be StreamGraph, JobGraph or > ExecutionGraph. Then I would create one thread for each vertex in the > graph. A thread would execute the computation logic according to operators > in its vertex and pass data across threads according to edges in the graph. > In this way I can execute the flink job within a single process without > starting any cluster. > > > In fact the program above can even be executed in a single-threaded way. > The only one thread just need to change its context among different > vertexes in round-robin fashion according to their topological order. > > > I believe this could allow flink jobs to be executed without flink > cluster, because threads can be properly managed by JVM, processes and > operating systems. Unlike flink tasks, threads do not need to be > allocated by ResourceManager, monitored by TaskManager or dispatched by > Dispatcher, so there is no need for flink cluster. > > > I agree that basing "Flink as a library" on reactive mode is a great idea. > Every user of the cluster is also service provider, and the general system > is elastic to coming or leaving members. My concerns of this idea are > > - I thought "flink as a library" requires running flink without flink > cluster, but this idea still contains flink cluster, just that this cluster > is internal rather than external. > - This idea would make one flink application depends on others, and > this seems to bring availability as an issue. A Flink application might > happen to be JobManager, or it manages tasks whose downstream or upstream > tasks are on other nodes. If the application fails in this case, it would > cause applications in other processes to fail. I think few library would > have such cross-process effects. > > > Still, I think this idea is fascinating and is a promising solution. I am > also looking forward to you and any community member to comment on my > plan's feasibility, and whether my plan aligns with the general roadmap of > "flink as a library". > > > Best regards, > > Yunfeng Zhou > > ------------------------------------------------------------------ > 发件人:Till Rohrmann<trohrm...@apache.org> > 日 期:2021年04月15日 16:12:04 > 收件人:周云峰(云岩)<zhouyunfeng....@alibaba-inc.com> > 抄 送:Rohrmann<till.rohrm...@alibaba-inc.com>; 秦江杰< > jiangjie...@alibaba-inc.com>; 林东(剑士)<lindong....@alibaba-inc.com> > 主 题:Re: "Flink as a libray" progress > > Hi Yunfeng, > > With the current release we have made a big step towards "Flink as a > library" with the introduction of the reactive mode. Using this mode we can > now start Flink TaskManager processes and the resources should be > automatically used. This feature is in a MVP state which means that it > still has a couple of rough edges and needs to be further improved but the > fundamentals are there. > > What's missing for the "Flink as a library" feature is to integrate the > reactive mode with how a user would use Flink as a library. What I would > expect from "XYZ as a library" is that I only need to add a dependency to > my program, do some API calls and then can start my binary as I would start > any other binary. What this means for Flink is that if you start your > binary, then it should join a cluster (probably the user has to configure > something to let processes find each other), figure out which process is > the leader (the leader becomes the JobManager) and the other processes > become the TaskManagers to execute the workload. That way, the user can > simply start and stop processes with the same Flink program and these > processes execute the specified job. Of course, one would also have to > check that the specified Flink job is the same. Ideally this happens > automatically and processes which execute the same Flink job find each > other and form a cluster. > > I don't think that there is already a full design for this feature in the > community. There are only ideas how such a feature could behave. Happy to > hear your ideas for this feature. Ideally you share it with the community. > > Cheers, > Till > > On Thu, Apr 15, 2021 at 4:41 AM 周云峰(云岩) <zhouyunfeng....@alibaba-inc.com> > wrote: > >> Hi Till, >> >> My name is Yunfeng Zhou and I joined Alibaba Cloud's Real-time Computing >> Eco Team in Shanghai about a month ago. Nice to meet you. >> >> I learned from flink's roadmap that we are trying to achieve "Flink as a >> library" and heard that you have been working on this issue. I want to >> know more about its progress, as my current work also involves running >> flink job as function instead of starting a flink cluster. When will this >> feature be available for Flink, and may I have more information about its >> general structure and API document? >> >> Best regards, >> Yunfeng Zhou >> > >