Re: Hadoop's Configuration object isn't threadsafe

2014-07-16 Thread Andrew Ash
Sounds good -- I added comments to the ticket. Since SPARK-2521 is scheduled for a 1.1.0 release and we can work around with spark.speculation, I don't personally see a need for a 1.0.2 backport. Thanks looking through this issue! On Thu, Jul 17, 2014 at 2:14 AM, Patrick Wendell wrote: > Hey

Re: Hadoop's Configuration object isn't threadsafe

2014-07-16 Thread Patrick Wendell
Hey Andrew, I think you are correct and a follow up to SPARK-2521 will end up fixing this. The desing of SPARK-2521 automatically broadcasts RDD data in tasks and the approach creates a new copy of the RDD and associated data for each task. A natural follow-up to that patch is to stop handling the

Re: Hadoop's Configuration object isn't threadsafe

2014-07-16 Thread Andrew Ash
Hi Patrick, thanks for taking a look. I filed as https://issues.apache.org/jira/browse/SPARK-2546 Would you recommend I pursue the cloned Configuration object approach now and send in a PR? Reynold's recent announcement of the broadcast RDD object patch may also have implications of the right pa

Re: Hadoop's Configuration object isn't threadsafe

2014-07-15 Thread Patrick Wendell
Hey Andrew, Cloning the conf this might be a good/simple fix for this particular problem. It's definitely worth looking into. There are a few things we can probably do in Spark to deal with non-thread-safety inside of the Hadoop FileSystem and Configuration classes. One thing we can do in general

Re: Hadoop's Configuration object isn't threadsafe

2014-07-15 Thread Andrew Ash
Hi Shengzhe, Even if we did make Configuration threadsafe, it'd take quite some time for that to trickle down to a Hadoop release that we could actually rely on Spark users having installed. I agree we should consider whether making Configuration threadsafe is something that Hadoop should do, but

Re: Hadoop's Configuration object isn't threadsafe

2014-07-15 Thread yao
Good catch Andrew. In addition to your proposed solution, is that possible to fix Configuration class and make it thread-safe ? I think the fix should be trivial, just use a ConcurrentHashMap, but I am not sure if we can push this change upstream (will hadoop guys accept this change ? for them, it