Re: "Flink as a libray" progress

Till Rohrmann Mon, 26 Apr 2021 09:02:48 -0700

Hi Yunfeng,

sorry for the late reply. I think what you have in mind is a bit different
from what I described. If you want to support running a Flink job within a
single process, then you could use Flink's MiniCluster. The MiniCluster
should give you what you are looking for.


Concerning your comments:

1. You are right that there will still be a "cluster" involved in the
execution of a Flink as a library program. However, it will no longer be
visible to the user. From the users perspective, he would simply start a
process and the rest is done in the background by Flink.
2. It is true that a failing process will affect the overall computation
and it will probably trigger a failover. The idea is then that the cluster
is self-healing. This means that a new process will become the JobManager
if the current JobManager fails. If a TaskManager disappears, then Flink
will execute the job with a lower parallelism.

Cheers,
Till

On Thu, Apr 15, 2021 at 2:17 PM 周云峰(云岩) <zhouyunfeng....@alibaba-inc.com>
wrote:

> Hi Till,
>
>
> Thanks for your kind reply about "Flink as a library". I'm glad to share
> my ideas and thoughts with you and the community.
>
>
> As background information, my recent work is to reduce the latency and
> resource cost of my flink jobs. The jobs are so short and light that even
> the latency to start MiniClusters or the cost to maintain a remote cluster
> would be non-negligible and unacceptible. My understanding of "flink as a
> library" is that a flink job should be run without any cluster and as
> lightweight as possible, and that's why I expect to solve my problem with
> this feature.
>
>
> Without one currently available, my plan to implement my own "flink as a
> library" feature is as follows. First I would try to extract the logical
> graph of my flink job, which could be StreamGraph, JobGraph or
> ExecutionGraph. Then I would create one thread for each vertex in the
> graph. A thread would execute the computation logic according to operators
> in its vertex and pass data across threads according to edges in the graph.
> In this way I can execute the flink job within a single process without
> starting any cluster.
>
>
> In fact the program above can even be executed in a single-threaded way.
> The only one thread just need to change its context among different
> vertexes in round-robin fashion according to their topological order.
>
>
> I believe this could allow flink jobs to be executed without flink
> cluster, because threads can be properly managed by JVM, processes and
> operating systems. Unlike flink tasks, threads do not need to be
> allocated by ResourceManager, monitored by TaskManager or dispatched by
> Dispatcher, so there is no need for flink cluster.
>
>
> I agree that basing "Flink as a library" on reactive mode is a great idea.
> Every user of the cluster is also service provider, and the general system
> is elastic to coming or leaving members. My concerns of this idea are
>
>    - I thought "flink as a library" requires running flink without flink
>    cluster, but this idea still contains flink cluster, just that this cluster
>    is internal rather than external.
>    - This idea would make one flink application depends on others, and
>    this seems to bring availability as an issue. A Flink application might
>    happen to be JobManager, or it manages tasks whose downstream or upstream
>    tasks are on other nodes. If the application fails in this case, it would
>    cause applications in other processes to fail. I think few library would
>    have such cross-process effects.
>
>
> Still, I think this idea is fascinating and is a promising solution. I am
> also looking forward to you and any community member to comment on my
> plan's feasibility, and whether my plan aligns with the general roadmap of
> "flink as a library".
>
>
> Best regards,
>
> Yunfeng Zhou
>
> ------------------------------------------------------------------
> 发件人：Till Rohrmann<trohrm...@apache.org>
> 日 期：2021年04月15日 16:12:04
> 收件人：周云峰(云岩)<zhouyunfeng....@alibaba-inc.com>
> 抄 送：Rohrmann<till.rohrm...@alibaba-inc.com>; 秦江杰<
> jiangjie...@alibaba-inc.com>; 林东(剑士)<lindong....@alibaba-inc.com>
> 主 题：Re: "Flink as a libray" progress
>
> Hi Yunfeng,
>
> With the current release we have made a big step towards "Flink as a
> library" with the introduction of the reactive mode. Using this mode we can
> now start Flink TaskManager processes and the resources should be
> automatically used. This feature is in a MVP state which means that it
> still has a couple of rough edges and needs to be further improved but the
> fundamentals are there.
>
> What's missing for the "Flink as a library" feature is to integrate the
> reactive mode with how a user would use Flink as a library. What I would
> expect from "XYZ as a library" is that I only need to add a dependency to
> my program, do some API calls and then can start my binary as I would start
> any other binary. What this means for Flink is that if you start your
> binary, then it should join a cluster (probably the user has to configure
> something to let processes find each other), figure out which process is
> the leader (the leader becomes the JobManager) and the other processes
> become the TaskManagers to execute the workload. That way, the user can
> simply start and stop processes with the same Flink program and these
> processes execute the specified job. Of course, one would also have to
> check that the specified Flink job is the same. Ideally this happens
> automatically and processes which execute the same Flink job find each
> other and form a cluster.
>
> I don't think that there is already a full design for this feature in the
> community. There are only ideas how such a feature could behave. Happy to
> hear your ideas for this feature. Ideally you share it with the community.
>
> Cheers,
> Till
>
> On Thu, Apr 15, 2021 at 4:41 AM 周云峰(云岩) <zhouyunfeng....@alibaba-inc.com>
> wrote:
>
>> Hi Till,
>>
>> My name is Yunfeng Zhou and I joined Alibaba Cloud's Real-time Computing
>> Eco Team in Shanghai about a month ago. Nice to meet you.
>>
>> I learned from flink's roadmap that we are trying to achieve "Flink as a
>> library" and heard that you have been working on this issue. I want to
>> know more about its progress, as my current work also involves running
>> flink job as function instead of starting a flink cluster. When will this
>> feature be available for Flink, and may I have more information about its
>> general structure and API document?
>>
>> Best regards,
>> Yunfeng Zhou
>>
>
>

Re: "Flink as a libray" progress

Reply via email to