Hi,

we're testing the newly released batch recovery and are running into class
loading related issues.

1) We have a per-job flink cluster
2) We use BATCH execution mode + region failover strategy

Point 1) should imply single user code class loader per task manager
(because there is only single pipeline, that reuses class loader cached in
BlobLibraryCacheManager). We need this property, because we have UDFs that
access C libraries using JNI (I think this may be fairly common use-case
when dealing with legacy code). JDK internals
<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L2466>
make sure that single library can be only loaded by a single class loader
per JVM.

When region recovery is triggered, vertices that need recover are first
reset back to CREATED stated and then rescheduled. In case all tasks in a
task manager are reset, this results in cached class loader being released
<https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L338>.
This unfortunately causes job failure, because we try to reload a native
library in a newly created class loader.

I know that there is always possibility to distribute native libraries with
flink's libs and load it using system class loader, but this introduces a
build & operations overhead and just make it really unfriendly for cluster
user, so I'd rather not work around the issue this way (per-job cluster
should be more user friendly).

I believe the correct approach would be not to release cached class loader
if the job is recovering, even though there are no tasks currently
registered with TM.

What do you think? Thanks for help.

D.

Reply via email to