Hey Andrew, Yeah, that would be preferable. Definitely worth investigating both, but the regression is more pressing at the moment.
- Patrick On Mon, Jul 14, 2014 at 10:02 PM, Andrew Ash <and...@andrewash.com> wrote: > I don't believe mine is a regression. But it is related to thread safety on > Hadoop Configuration objects. Should I start a new thread? > On Jul 15, 2014 12:55 AM, "Patrick Wendell" <pwend...@gmail.com> wrote: > >> Andrew is your issue also a regression from 1.0.0 to 1.0.1? The >> immediate priority is addressing regressions between these two >> releases. >> >> On Mon, Jul 14, 2014 at 9:05 PM, Andrew Ash <and...@andrewash.com> wrote: >> > I'm not sure either of those PRs will fix the concurrent adds to >> > Configuration issue I observed. I've got a stack trace and writeup I'll >> > share in an hour or two (traveling today). >> > On Jul 14, 2014 9:50 PM, "scwf" <wangf...@huawei.com> wrote: >> > >> >> hi,Cody >> >> i met this issue days before and i post a PR for this( >> >> https://github.com/apache/spark/pull/1385) >> >> it's very strange that if i synchronize conf it will deadlock but it is >> ok >> >> when synchronize initLocalJobConfFuncOpt >> >> >> >> >> >> Here's the entire jstack output. >> >>> >> >>> >> >>> On Mon, Jul 14, 2014 at 4:44 PM, Patrick Wendell <pwend...@gmail.com >> >>> <mailto:pwend...@gmail.com>> wrote: >> >>> >> >>> Hey Cody, >> >>> >> >>> This Jstack seems truncated, would you mind giving the entire stack >> >>> trace? For the second thread, for instance, we can't see where the >> >>> lock is being acquired. >> >>> >> >>> - Patrick >> >>> >> >>> On Mon, Jul 14, 2014 at 1:42 PM, Cody Koeninger >> >>> <cody.koenin...@mediacrossing.com <mailto:cody.koeninger@ >> >>> mediacrossing.com>> wrote: >> >>> > Hi all, just wanted to give a heads up that we're seeing a >> >>> reproducible >> >>> > deadlock with spark 1.0.1 with 2.3.0-mr1-cdh5.0.2 >> >>> > >> >>> > If jira is a better place for this, apologies in advance - >> figured >> >>> talking >> >>> > about it on the mailing list was friendlier than randomly >> >>> (re)opening jira >> >>> > tickets. >> >>> > >> >>> > I know Gary had mentioned some issues with 1.0.1 on the mailing >> >>> list, once >> >>> > we got a thread dump I wanted to follow up. >> >>> > >> >>> > The thread dump shows the deadlock occurs in the synchronized >> >>> block of code >> >>> > that was changed in HadoopRDD.scala, for the Spark-1097 issue >> >>> > >> >>> > Relevant portions of the thread dump are summarized below, we >> can >> >>> provide >> >>> > the whole dump if it's useful. >> >>> > >> >>> > Found one Java-level deadlock: >> >>> > ============================= >> >>> > "Executor task launch worker-1": >> >>> > waiting to lock monitor 0x00007f250400c520 (object >> >>> 0x00000000fae7dc30, a >> >>> > org.apache.hadoop.co <http://org.apache.hadoop.co> >> >>> > nf.Configuration), >> >>> > which is held by "Executor task launch worker-0" >> >>> > "Executor task launch worker-0": >> >>> > waiting to lock monitor 0x00007f2520495620 (object >> >>> 0x00000000faeb4fc8, a >> >>> > java.lang.Class), >> >>> > which is held by "Executor task launch worker-1" >> >>> > >> >>> > >> >>> > "Executor task launch worker-1": >> >>> > at >> >>> > org.apache.hadoop.conf.Configuration.reloadConfiguration( >> >>> Configuration.java:791) >> >>> > - waiting to lock <0x00000000fae7dc30> (a >> >>> > org.apache.hadoop.conf.Configuration) >> >>> > at >> >>> > org.apache.hadoop.conf.Configuration.addDefaultResource( >> >>> Configuration.java:690) >> >>> > - locked <0x00000000faca6ff8> (a java.lang.Class for >> >>> > org.apache.hadoop.conf.Configurati >> >>> > on) >> >>> > at >> >>> > org.apache.hadoop.hdfs.HdfsConfiguration.<clinit>( >> >>> HdfsConfiguration.java:34) >> >>> > at >> >>> > org.apache.hadoop.hdfs.DistributedFileSystem.<clinit> >> >>> (DistributedFileSystem.java:110 >> >>> > ) >> >>> > at sun.reflect.NativeConstructorAccessorImpl. >> >>> newInstance0(Native >> >>> > Method) >> >>> > at >> >>> > sun.reflect.NativeConstructorAccessorImpl.newInstance( >> >>> NativeConstructorAccessorImpl. >> >>> > java:57) >> >>> > at sun.reflect.NativeConstructorAccessorImpl. >> >>> newInstance0(Native >> >>> > Method) >> >>> > at >> >>> > sun.reflect.NativeConstructorAccessorImpl.newInstance( >> >>> NativeConstructorAccessorImpl. >> >>> > java:57) >> >>> > at >> >>> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance( >> >>> DelegatingConstructorAcces >> >>> > sorImpl.java:45) >> >>> > at java.lang.reflect.Constructor. >> >>> newInstance(Constructor.java:525) >> >>> > at java.lang.Class.newInstance0(Class.java:374) >> >>> > at java.lang.Class.newInstance(Class.java:327) >> >>> > at java.util.ServiceLoader$LazyIterator.next( >> >>> ServiceLoader.java:373) >> >>> > at >> java.util.ServiceLoader$1.next(ServiceLoader.java:445) >> >>> > at >> >>> > org.apache.hadoop.fs.FileSystem.loadFileSystems( >> >>> FileSystem.java:2364) >> >>> > - locked <0x00000000faeb4fc8> (a java.lang.Class for >> >>> > org.apache.hadoop.fs.FileSystem) >> >>> > at >> >>> > org.apache.hadoop.fs.FileSystem.getFileSystemClass( >> >>> FileSystem.java:2375) >> >>> > at >> >>> > org.apache.hadoop.fs.FileSystem.createFileSystem( >> >>> FileSystem.java:2392) >> >>> > at org.apache.hadoop.fs.FileSystem.access$200( >> >>> FileSystem.java:89) >> >>> > at >> >>> > org.apache.hadoop.fs.FileSystem$Cache.getInternal( >> >>> FileSystem.java:2431) >> >>> > at org.apache.hadoop.fs.FileSystem$Cache.get( >> >>> FileSystem.java:2413) >> >>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem. >> >>> java:368) >> >>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem. >> >>> java:167) >> >>> > at >> >>> > org.apache.hadoop.mapred.JobConf.getWorkingDirectory( >> >>> JobConf.java:587) >> >>> > at >> >>> > org.apache.hadoop.mapred.FileInputFormat.setInputPaths( >> >>> FileInputFormat.java:315) >> >>> > at >> >>> > org.apache.hadoop.mapred.FileInputFormat.setInputPaths( >> >>> FileInputFormat.java:288) >> >>> > at >> >>> > org.apache.spark.SparkContext$$anonfun$22.apply( >> >>> SparkContext.scala:546) >> >>> > at >> >>> > org.apache.spark.SparkContext$$anonfun$22.apply( >> >>> SparkContext.scala:546) >> >>> > at >> >>> > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$ >> >>> 1.apply(HadoopRDD.scala:145) >> >>> > >> >>> > >> >>> > >> >>> > ...elided... >> >>> > >> >>> > >> >>> > "Executor task launch worker-0" daemon prio=10 >> >>> tid=0x0000000001e71800 >> >>> > nid=0x2d97 waiting for monitor entry [0x00007f24d2bf1000] >> >>> > java.lang.Thread.State: BLOCKED (on object monitor) >> >>> > at >> >>> > org.apache.hadoop.fs.FileSystem.loadFileSystems( >> >>> FileSystem.java:2362) >> >>> > - waiting to lock <0x00000000faeb4fc8> (a >> java.lang.Class >> >>> for >> >>> > org.apache.hadoop.fs.FileSystem) >> >>> > at >> >>> > org.apache.hadoop.fs.FileSystem.getFileSystemClass( >> >>> FileSystem.java:2375) >> >>> > at >> >>> > org.apache.hadoop.fs.FileSystem.createFileSystem( >> >>> FileSystem.java:2392) >> >>> > at org.apache.hadoop.fs.FileSystem.access$200( >> >>> FileSystem.java:89) >> >>> > at >> >>> > org.apache.hadoop.fs.FileSystem$Cache.getInternal( >> >>> FileSystem.java:2431) >> >>> > at org.apache.hadoop.fs.FileSystem$Cache.get( >> >>> FileSystem.java:2413) >> >>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem. >> >>> java:368) >> >>> > at org.apache.hadoop.fs.FileSystem.get(FileSystem. >> >>> java:167) >> >>> > at >> >>> > org.apache.hadoop.mapred.JobConf.getWorkingDirectory( >> >>> JobConf.java:587) >> >>> > at >> >>> > org.apache.hadoop.mapred.FileInputFormat.setInputPaths( >> >>> FileInputFormat.java:315) >> >>> > at >> >>> > org.apache.hadoop.mapred.FileInputFormat.setInputPaths( >> >>> FileInputFormat.java:288) >> >>> > at >> >>> > org.apache.spark.SparkContext$$anonfun$22.apply( >> >>> SparkContext.scala:546) >> >>> > at >> >>> > org.apache.spark.SparkContext$$anonfun$22.apply( >> >>> SparkContext.scala:546) >> >>> > at >> >>> > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$ >> >>> 1.apply(HadoopRDD.scala:145) >> >>> >> >>> >> >>> >> >> >> >> -- >> >> >> >> Best Regards >> >> Fei Wang >> >> >> >> ------------------------------------------------------------ >> >> -------------------- >> >> >> >> >> >> >>