Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

tison Thu, 19 Dec 2019 18:16:07 -0800

Hi Peter,

I'm afraid that FLIP-73 also changes how per-job works. Please check the
work first. You
can search AbstractJobClusterExecutor and its call graph.


For how it influences your proposal FLIP-85, I already mentioned above that

>user program is designed to ALWAYS run on the client side. Specifically,
>we deploy the job in executor when env.execute called. This abstraction
possibly prevents
>Flink runs user program on the cluster side.

Best,
tison.


Peter Huang <huangzhenqiu0...@gmail.com> 于2019年12月19日周四 上午2:54写道：

> Hi Yang,
>
> Thanks for your input, I can see the master side job graph generation is a
> common requirement for per job mode.
> I think FLIP-73 is mainly for session mode. I think the proposal is a
> valid improvement for existing CLI and per job mode.
>
>
> Best Regards
> Peter Huang
>
> On Wed, Dec 18, 2019 at 3:21 AM Yang Wang <danrtsey...@gmail.com> wrote:
>
>>  I just want to revive this discussion.
>>
>> Recently, i am thinking about how to natively run flink per-job cluster on
>> Kubernetes.
>> The per-job mode on Kubernetes is very different from on Yarn. And we will
>> have
>> the same deployment requirements to the client and entry point.
>>
>> 1. Flink client not always need a local jar to start a Flink per-job
>> cluster. We could
>> support multiple schemas. For example, file:///path/of/my.jar means a jar
>> located
>> at client side, hdfs://myhdfs/user/myname/flink/my.jar means a jar located
>> at
>> remote hdfs, local:///path/in/image/my.jar means a jar located at
>> jobmanager side.
>>
>> 2. Support running user program on master side. This also means the entry
>> point
>> will generate the job graph on master side. We could use the
>> ClasspathJobGraphRetriever
>> or start a local Flink client to achieve this purpose.
>>
>>
>> cc tison, Aljoscha & Kostas Do you think this is the right direction we
>> need to work?
>>
>> tison <wander4...@gmail.com> 于2019年12月12日周四 下午4:48写道：
>>
>> > A quick idea is that we separate the deployment from user program that
>> it
>> > has always been done
>> > outside the program. On user program executed there is always a
>> > ClusterClient that communicates with
>> > an existing cluster, remote or local. It will be another thread so just
>> for
>> > your information.
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > tison <wander4...@gmail.com> 于2019年12月12日周四 下午4:40写道：
>> >
>> > > Hi Peter,
>> > >
>> > > Another concern I realized recently is that with current Executors
>> > > abstraction(FLIP-73)
>> > > I'm afraid that user program is designed to ALWAYS run on the client
>> > side.
>> > > Specifically,
>> > > we deploy the job in executor when env.execute called. This
>> abstraction
>> > > possibly prevents
>> > > Flink runs user program on the cluster side.
>> > >
>> > > For your proposal, in this case we already compiled the program and
>> run
>> > on
>> > > the client side,
>> > > even we deploy a cluster and retrieve job graph from program
>> metadata, it
>> > > doesn't make
>> > > many sense.
>> > >
>> > > cc Aljoscha & Kostas what do you think about this constraint?
>> > >
>> > > Best,
>> > > tison.
>> > >
>> > >
>> > > Peter Huang <huangzhenqiu0...@gmail.com> 于2019年12月10日周二 下午12:45写道：
>> > >
>> > >> Hi Tison,
>> > >>
>> > >> Yes, you are right. I think I made the wrong argument in the doc.
>> > >> Basically, the packaging jar problem is only for platform users. In
>> our
>> > >> internal deploy service,
>> > >> we further optimized the deployment latency by letting users to
>> > packaging
>> > >> flink-runtime together with the uber jar, so that we don't need to
>> > >> consider
>> > >> multiple flink version
>> > >> support for now. In the session client mode, as Flink libs will be
>> > shipped
>> > >> anyway as local resources of yarn. Users actually don't need to
>> package
>> > >> those libs into job jar.
>> > >>
>> > >>
>> > >>
>> > >> Best Regards
>> > >> Peter Huang
>> > >>
>> > >> On Mon, Dec 9, 2019 at 8:35 PM tison <wander4...@gmail.com> wrote:
>> > >>
>> > >> > > 3. What do you mean about the package? Do users need to compile
>> > their
>> > >> > jars
>> > >> > inlcuding flink-clients, flink-optimizer, flink-table codes?
>> > >> >
>> > >> > The answer should be no because they exist in system classpath.
>> > >> >
>> > >> > Best,
>> > >> > tison.
>> > >> >
>> > >> >
>> > >> > Yang Wang <danrtsey...@gmail.com> 于2019年12月10日周二 下午12:18写道：
>> > >> >
>> > >> > > Hi Peter,
>> > >> > >
>> > >> > > Thanks a lot for starting this discussion. I think this is a very
>> > >> useful
>> > >> > > feature.
>> > >> > >
>> > >> > > Not only for Yarn, i am focused on flink on Kubernetes
>> integration
>> > and
>> > >> > come
>> > >> > > across the same
>> > >> > > problem. I do not want the job graph generated on client side.
>> > >> Instead,
>> > >> > the
>> > >> > > user jars are built in
>> > >> > > a user-defined image. When the job manager launched, we just
>> need to
>> > >> > > generate the job graph
>> > >> > > based on local user jars.
>> > >> > >
>> > >> > > I have some small suggestion about this.
>> > >> > >
>> > >> > > 1. `ProgramJobGraphRetriever` is very similar to
>> > >> > > `ClasspathJobGraphRetriever`, the differences
>> > >> > > are the former needs `ProgramMetadata` and the latter needs some
>> > >> > arguments.
>> > >> > > Is it possible to
>> > >> > > have an unified `JobGraphRetriever` to support both?
>> > >> > > 2. Is it possible to not use a local user jar to start a per-job
>> > >> cluster?
>> > >> > > In your case, the user jars has
>> > >> > > existed on hdfs already and we do need to download the jars to
>> > >> deployer
>> > >> > > service. Currently, we
>> > >> > > always need a local user jar to start a flink cluster. It is be
>> > great
>> > >> if
>> > >> > we
>> > >> > > could support remote user jars.
>> > >> > > >> In the implementation, we assume users package flink-clients,
>> > >> > > flink-optimizer, flink-table together within the job jar.
>> Otherwise,
>> > >> the
>> > >> > > job graph generation within JobClusterEntryPoint will fail.
>> > >> > > 3. What do you mean about the package? Do users need to compile
>> > their
>> > >> > jars
>> > >> > > inlcuding flink-clients, flink-optimizer, flink-table codes?
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > Best,
>> > >> > > Yang
>> > >> > >
>> > >> > > Peter Huang <huangzhenqiu0...@gmail.com> 于2019年12月10日周二
>> 上午2:37写道：
>> > >> > >
>> > >> > > > Dear All,
>> > >> > > >
>> > >> > > > Recently, the Flink community starts to improve the yarn
>> cluster
>> > >> > > descriptor
>> > >> > > > to make job jar and config files configurable from CLI. It
>> > improves
>> > >> the
>> > >> > > > flexibility of  Flink deployment Yarn Per Job Mode. For
>> platform
>> > >> users
>> > >> > > who
>> > >> > > > manage tens of hundreds of streaming pipelines for the whole
>> org
>> > or
>> > >> > > > company, we found the job graph generation in client-side is
>> > another
>> > >> > > > pinpoint. Thus, we want to propose a configurable feature for
>> > >> > > > FlinkYarnSessionCli. The feature can allow users to choose the
>> job
>> > >> > graph
>> > >> > > > generation in Flink ClusterEntryPoint so that the job jar
>> doesn't
>> > >> need
>> > >> > to
>> > >> > > > be locally for the job graph generation. The proposal is
>> organized
>> > >> as a
>> > >> > > > FLIP
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation
>> > >> > > > .
>> > >> > > >
>> > >> > > > Any questions and suggestions are welcomed. Thank you in
>> advance.
>> > >> > > >
>> > >> > > >
>> > >> > > > Best Regards
>> > >> > > > Peter Huang
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

Reply via email to