Hi Biao,

I really appreciate your thorough answers. And yes for now I took the
workaround by manipulating the directory names.
To follow up with one more question if you don't mind:
What is the recommended way of managing plugins in YarnClusterDescriptor?
Currently I am placing the plugins (e.g. flink-s3-fs-hadoop) under the
system jars setting, which works. But I am also seeing this comment in the
code
<https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L908>
so I am a bit confused.


Thanks
Leon

On Sat, Jun 4, 2022 at 11:03 PM Biao Geng <biaoge...@gmail.com> wrote:

> Hi Leon,
>
> For your question1, in the classpath, there are 2 types of jars: user jars
> and flink system jars(i.e. jars in flink/lib). System jars are sorted
> alphabetically. For user jars, there are 3 choices to add user jars in the
> final classpath: ORDER, FIRST, LAST(See the doc
> <https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/resource-providers/yarn/#user-jars--classpath>
> for more details). To my best knowledge, there is no way to pass a sort
> function for this for now. One workaround is managing your jar paths. You
> can put the jar that you want to load first in an alphabetical smaller
> directory(e.g a-flink/user-jar).
> For your question2, flink-dist.jar is always at the end of the system
> jars. Depending on your choices of adding user jars, it is not always at
> the end of the final generated classpath. flink-dist.jar is special and
> mandatory as we need it to launch java process to run ClusterEntrypoint on
> the cluster side. Other jars in the flink/lib can somehow be compromised.
>
> I have met a similar problem as well. My previous woraround is managing
> the directory name, which is not so elegant. It can be useful to add the
> ability to customize loading orders of jars in classpath while it is also
> important to package the jars more carefully to avoid the conflicts.
>
> Best,
> Biao Geng
>
>
> Leon Xu <l...@attentivemobile.com> 于2022年6月5日周日 03:21写道:
>
>> Hi Flink Community,
>>
>> We are building on top of  *org.apache.flink.yarn.YarnClusterDescriptor *to
>> submit a flink application from Java code to YARN cluster, in the
>> application mode. We are setting the classpath as the value of *the 
>> yarn.provided.lib.dirs
>> *property under the yarn configuration.
>>
>> By playing with the YarnClusterDescriptor code I have two questions that
>> I hope to get some answers:
>> 1. YarnClusterDescriptor seems to force the classpath loading in
>> alphabetical order. See code here
>> <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L966>.
>> Is there any specific reason for doing that? If I'd like to enforce my own
>> order is it possible now?
>> 2. Looks like the *flink-dist.jar* is treated separately from the other
>> classpath classes. In the *YarnApplicationFileUploader* class,
>> the registerMultipleLocalResources method will skip the jar if it is a dist
>> jar. See the code here
>> <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnApplicationFileUploader.java#L283>.
>> With the current behavior it seems it will always place the flink-dist.jar
>> at the end of the classpath. Is there any reason that Flink wants to treat
>> the *flink-dist.jar* separately from other jars?
>>
>> In our classpath loading we are hoping to enforce certain order because
>> different jars may contain the same dependent library but with different
>> versions. We hope to force the order so that we can load the correct
>> library version as we want.
>>
>>
>> Thanks
>> Leon
>>
>

Reply via email to