Hi Biao, I really appreciate your thorough answers. And yes for now I took the workaround by manipulating the directory names. To follow up with one more question if you don't mind: What is the recommended way of managing plugins in YarnClusterDescriptor? Currently I am placing the plugins (e.g. flink-s3-fs-hadoop) under the system jars setting, which works. But I am also seeing this comment in the code <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L908> so I am a bit confused.
Thanks Leon On Sat, Jun 4, 2022 at 11:03 PM Biao Geng <biaoge...@gmail.com> wrote: > Hi Leon, > > For your question1, in the classpath, there are 2 types of jars: user jars > and flink system jars(i.e. jars in flink/lib). System jars are sorted > alphabetically. For user jars, there are 3 choices to add user jars in the > final classpath: ORDER, FIRST, LAST(See the doc > <https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/resource-providers/yarn/#user-jars--classpath> > for more details). To my best knowledge, there is no way to pass a sort > function for this for now. One workaround is managing your jar paths. You > can put the jar that you want to load first in an alphabetical smaller > directory(e.g a-flink/user-jar). > For your question2, flink-dist.jar is always at the end of the system > jars. Depending on your choices of adding user jars, it is not always at > the end of the final generated classpath. flink-dist.jar is special and > mandatory as we need it to launch java process to run ClusterEntrypoint on > the cluster side. Other jars in the flink/lib can somehow be compromised. > > I have met a similar problem as well. My previous woraround is managing > the directory name, which is not so elegant. It can be useful to add the > ability to customize loading orders of jars in classpath while it is also > important to package the jars more carefully to avoid the conflicts. > > Best, > Biao Geng > > > Leon Xu <l...@attentivemobile.com> 于2022年6月5日周日 03:21写道: > >> Hi Flink Community, >> >> We are building on top of *org.apache.flink.yarn.YarnClusterDescriptor *to >> submit a flink application from Java code to YARN cluster, in the >> application mode. We are setting the classpath as the value of *the >> yarn.provided.lib.dirs >> *property under the yarn configuration. >> >> By playing with the YarnClusterDescriptor code I have two questions that >> I hope to get some answers: >> 1. YarnClusterDescriptor seems to force the classpath loading in >> alphabetical order. See code here >> <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L966>. >> Is there any specific reason for doing that? If I'd like to enforce my own >> order is it possible now? >> 2. Looks like the *flink-dist.jar* is treated separately from the other >> classpath classes. In the *YarnApplicationFileUploader* class, >> the registerMultipleLocalResources method will skip the jar if it is a dist >> jar. See the code here >> <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnApplicationFileUploader.java#L283>. >> With the current behavior it seems it will always place the flink-dist.jar >> at the end of the classpath. Is there any reason that Flink wants to treat >> the *flink-dist.jar* separately from other jars? >> >> In our classpath loading we are hoping to enforce certain order because >> different jars may contain the same dependent library but with different >> versions. We hope to force the order so that we can load the correct >> library version as we want. >> >> >> Thanks >> Leon >> >