Hi community,
I was testing Flink 1.17 on Kubernetes and ran into a strange class loading
problem. In short, the logs
show org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback was
loaded, however the program will throw ClassNotFoundException anyway.
The exception was thrown by Aliyun OSS Filesystem plugin lib. the log shows:
2023-04-17 11:29:54.269 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Shutting
KubernetesApplicationClusterEntrypoint down with application status FAILED.
Diagnostics org.apache.flink.util.FlinkException: Could not create the ha
services from the instantiated HighAvailabilityServicesFactory>
at
org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:299)
at
org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:285)
at
org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:145)
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:439)
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:382)
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:282)
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:232)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at
org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:229)
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:729)
at
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:86)
Caused by: java.io.IOException: Could not create FileSystem for highly
available storage path
(oss://octopus-flink-test/checkpoints/ha/state-machine-test)
at
org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:102)
at
org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86)
at
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.createHAServices(KubernetesHaServicesFactory.java:41)
at
org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:296)
... 13 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback not found
at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2720)
at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.Groups.<init>(Groups.java:107)
at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.Groups.<init>(Groups.java:102)
at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:451)
at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:338)
at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.initialize(AliyunOSSFileSystem.java:341)
at
org.apache.flink.fs.osshadoop.OSSFileSystemFactory.create(OSSFileSystemFactory.java:103)
at
org.apache.flink.core.fs.PluginFileSystemFactory.create(PluginFileSystemFactory.java:62)
at
org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:508)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274)
at
org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:99)
... 16 more
So I turned on -verbose:class to check whether the class file was loaded.
And I can see a class with similar name was loaded:
[Loaded
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback
from
file:/opt/flink/plugins/flink-oss-fs-hadoop/flink-oss-fs-hadoop-1.17.0.jar]
At first glance, I thought it was because the package name was changed
after shading. So I downloaded hadoop3-common jar and added it to
/opt/flink/lib. Then I can see that
org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback was loaded
too:
[Loaded org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback
from file:/opt/flink/lib/flink-shaded-hadoop2-uber-2.8.3-1.8.3.jar]
But the problem persists.
My dockerfile is:
FROM flink:1.17.0-java8
ADD --chown=flink:flink
https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop2-uber/2.8.3-1.8.3/flink-shaded-hadoop2-uber-2.8.3-1.8.3.jar
/opt/flink/lib/
ADD --chown=flink:flink
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/3.3.5/hadoop-common-3.3.5.jar
/opt/flink/lib/
RUN mkdir /opt/flink/plugins/flink-oss-fs-hadoop/ && cp
/opt/flink/opt/flink-oss-fs-hadoop-1.17.0.jar
/opt/flink/plugins/flink-oss-fs-hadoop/
Does anyone have ideas why this problem occurs? Thanks!