[ https://issues.apache.org/jira/browse/FLINK-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996568#comment-15996568 ]
ASF GitHub Bot commented on FLINK-6020: --------------------------------------- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3525 @netguy204 I think you are affected by a different issue. In your case, there are no damaged jar files, but it looks like the classloader has been closed. Flink creates classloaders per job and caches them across different tasks of that job. It closes the dynamically created classloaders when all tasks from the job are done. Is it possible that a classloader passes between jobs, meaning that another job uses a class loaders that was created for another job? Do you store some objects / classes / classloaders somewhere in a static context or a cache or interner so that it can be that one job created them and another job re-uses them? > Blob Server cannot handle multiple job submits (with same content) parallelly > ----------------------------------------------------------------------------- > > Key: FLINK-6020 > URL: https://issues.apache.org/jira/browse/FLINK-6020 > Project: Flink > Issue Type: Sub-task > Components: Distributed Coordination > Reporter: Tao Wang > Assignee: Tao Wang > Priority: Critical > > In yarn-cluster mode, if we submit one same job multiple times parallelly, > the task will encounter class load problem and lease occuputation. > Because blob server stores user jars in name with generated sha1sum of those, > first writes a temp file and move it to finalialize. For recovery it also > will put them to HDFS with same file name. > In same time, when multiple clients sumit same job with same jar, the local > jar files in blob server and those file on hdfs will be handled in multiple > threads(BlobServerConnection), and impact each other. > It's better to have a way to handle this, now two ideas comes up to my head: > 1. lock the write operation, or > 2. use some unique identifier as file name instead of ( or added up to) > sha1sum of the file contents. -- This message was sent by Atlassian JIRA (v6.3.15#6346)