RJ, could you provide a code example that can re-produce the bug you observed in local testing? Breeze's += is not thread-safe. But in a Spark job, calls to a resultHandler is synchronized: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala#L52 . Let's move our discussion to the JIRA page. -Xiangrui
On Wed, Sep 3, 2014 at 12:07 PM, RJ Nowling <rnowl...@gmail.com> wrote: > Here's the JIRA: > > https://issues.apache.org/jira/browse/SPARK-3384 > > Even if the current implementation uses += in a thread safe manner, it can > be easy to make the mistake of accidentally using += in a parallelized > context. I suggest changing all instances of += to +. > > I would encourage others to reproduce and validate this issue, though. > > > On Wed, Sep 3, 2014 at 3:02 PM, David Hall <d...@cs.berkeley.edu> wrote: > >> mutating operations are not thread safe. Operations that don't mutate >> should be thread safe. I can't speak to what Evan said, but I would guess >> that the way they're using += should be safe. >> >> >> On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling <rnowl...@gmail.com> wrote: >> >>> David, >>> >>> Can you confirm that += is not thread safe but + is? I'm assuming + >>> allocates a new object for the write, while += doesn't. >>> >>> Thanks! >>> RJ >>> >>> >>> On Wed, Sep 3, 2014 at 2:50 PM, David Hall <d...@cs.berkeley.edu> wrote: >>> >>>> In general, in Breeze we allocate separate work arrays for each call to >>>> lapack, so it should be fine. In general concurrent modification isn't >>>> thread safe of course, but things that "ought" to be thread safe really >>>> should be. >>>> >>>> >>>> On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling <rnowl...@gmail.com> wrote: >>>> >>>>> No, it's not in all cases. Since Breeze uses lapack under the hood, >>>>> changes to memory between different threads is bad. >>>>> >>>>> There's actually a potential bug in the KMeans code where it uses += >>>>> instead of +. >>>>> >>>>> >>>>> On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander < >>>>> alexander.ula...@hp.com> >>>>> wrote: >>>>> >>>>> > Hi, >>>>> > >>>>> > Is breeze library called thread safe from Spark mllib code in case >>>>> when >>>>> > native libs for blas and lapack are used? Might it be an issue when >>>>> running >>>>> > Spark locally? >>>>> > >>>>> > Best regards, Alexander >>>>> > --------------------------------------------------------------------- >>>>> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>> > For additional commands, e-mail: dev-h...@spark.apache.org >>>>> > >>>>> > >>>>> >>>>> >>>>> -- >>>>> em rnowl...@gmail.com >>>>> c 954.496.2314 >>>>> >>>> >>>> >>> >>> >>> -- >>> em rnowl...@gmail.com >>> c 954.496.2314 >>> >> >> > > > -- > em rnowl...@gmail.com > c 954.496.2314 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org