Sounds good -- I added comments to the ticket.
Since SPARK-2521 is scheduled for a 1.1.0 release and we can work around
with spark.speculation, I don't personally see a need for a 1.0.2 backport.
Thanks looking through this issue!
On Thu, Jul 17, 2014 at 2:14 AM, Patrick Wendell wrote:
> Hey
Hey Andrew,
I think you are correct and a follow up to SPARK-2521 will end up
fixing this. The desing of SPARK-2521 automatically broadcasts RDD
data in tasks and the approach creates a new copy of the RDD and
associated data for each task. A natural follow-up to that patch is to
stop handling the
Hi Patrick, thanks for taking a look. I filed as
https://issues.apache.org/jira/browse/SPARK-2546
Would you recommend I pursue the cloned Configuration object approach now
and send in a PR?
Reynold's recent announcement of the broadcast RDD object patch may also
have implications of the right pa
Hey Andrew,
Cloning the conf this might be a good/simple fix for this particular
problem. It's definitely worth looking into.
There are a few things we can probably do in Spark to deal with
non-thread-safety inside of the Hadoop FileSystem and Configuration
classes. One thing we can do in general
Hi Shengzhe,
Even if we did make Configuration threadsafe, it'd take quite some time for
that to trickle down to a Hadoop release that we could actually rely on
Spark users having installed. I agree we should consider whether making
Configuration threadsafe is something that Hadoop should do, but
Good catch Andrew. In addition to your proposed solution, is that possible
to fix Configuration class and make it thread-safe ? I think the fix should
be trivial, just use a ConcurrentHashMap, but I am not sure if we can push
this change upstream (will hadoop guys accept this change ? for them, it