[ 
https://issues.apache.org/jira/browse/FLINK-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014442#comment-17014442
 ] 

Arvid Heise commented on FLINK-15355:
-------------------------------------

[~liyu], you are right this issue is related to the unshading. Analysis of the 
issue:
 * In the system classloader, there is flink-dist and hadoop-uber.
 * The plugin classloader contains our s3 filesystem and the s3 hadoop 
implementation.
 * During initialization, the s3 hadoop implementation uses the `ServiceLoader` 
to load all hadoop FileSystems.
 * Classloader finds not only the expected service definitions but also those 
of hadoop-uber.
 * `ServiceLoader` tries to load all file systems but ofc cannot load those of 
hadoop-uber because of inconsistent class hierarchies (mixed plugin and system 
classloader)
 * `ServiceLoader` issues warning that trigger the `check_error` script to fail.

Firstly, I verified if this is a fundamental issue in the interplay of plugins 
and libraries or a cosmetic. I successfully mixed plugins and libraries by 
reading from a local HDFS and writing to S3 minio and vice versa.

I implemented a solution [https://github.com/apache/flink/pull/10845] that 
limits the scope of services to plugins from the plugin perspective. That gives 
the desired isolation but may require users to add more jars to the plugin 
folder in the worst case (security provider?). On the other hand, that use case 
would allow users to use different versions of security providers for different 
plugins.

We also need to investigate if the sketchy reinitialization of `FileSystem`s in 
`BucketingSink` is working with plugins.

> Nightly streaming file sink fails with unshaded hadoop
> ------------------------------------------------------
>
>                 Key: FLINK-15355
>                 URL: https://issues.apache.org/jira/browse/FLINK-15355
>             Project: Flink
>          Issue Type: Bug
>          Components: FileSystems
>    Affects Versions: 1.10.0, 1.11.0
>            Reporter: Arvid Heise
>            Assignee: Arvid Heise
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.10.0
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> {code:java}
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
> JobGraph.
>  at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
>  at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
>  at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
>  at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:664)
>  at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
>  at 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:895)
>  at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>  at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
> JobGraph.
>  at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
>  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1751)
>  at 
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:94)
>  at 
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:63)
>  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1628)
>  at StreamingFileSinkProgram.main(StreamingFileSinkProgram.java:77)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
>  ... 11 more
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit 
> JobGraph.
>  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1746)
>  ... 20 more
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
> submit JobGraph.
>  at 
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$7(RestClusterClient.java:326)
>  at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>  at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$8(FutureUtils.java:274)
>  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  at 
> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
>  at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Internal 
> server error., <Exception on server side:
> org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
>  at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$internalSubmitJob$3(Dispatcher.java:336)
>  at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>  at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>  at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>  at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>  at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
>  at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>  at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>  at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>  at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobExecutionException: Could not set up 
> JobManager
>  at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
>  at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>  ... 6 more
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not 
> set up JobManager
>  at 
> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:152)
>  at 
> org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84)
>  at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:379)
>  at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
>  ... 7 more
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Ljava/lang/String;Ljava/lang/String;Ljava/util/concurrent/TimeUnit;)J
>  at org.apache.hadoop.fs.s3a.S3ARetryPolicy.<init>(S3ARetryPolicy.java:113)
>  at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:257)
>  at 
> org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:126)
>  at 
> org.apache.flink.core.fs.PluginFileSystemFactory.create(PluginFileSystemFactory.java:61)
>  at 
> org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:441)
>  at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:362)
>  at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
>  at 
> org.apache.flink.runtime.state.memory.MemoryBackendCheckpointStorage.<init>(MemoryBackendCheckpointStorage.java:85)
>  at 
> org.apache.flink.runtime.state.memory.MemoryStateBackend.createCheckpointStorage(MemoryStateBackend.java:295)
>  at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:279)
>  at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:205)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.enableCheckpointing(ExecutionGraph.java:486)
>  at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:338)
>  at 
> org.apache.flink.runtime.scheduler.SchedulerBase.createExecutionGraph(SchedulerBase.java:245)
>  at 
> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:217)
>  at 
> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:205)
>  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119)
>  at 
> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:105)
>  at 
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:278)
>  at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:266)
>  at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
>  at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
>  at 
> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:146)
>  ... 10 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to