[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

Stavros Kontopoulos (JIRA) Tue, 05 Jul 2016 07:32:26 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362558#comment-15362558
 ]


Stavros Kontopoulos edited comment on SPARK-16379 at 7/5/16 2:31 PM:
---------------------------------------------------------------------

Usually you should not lock on the object for synchronization that way i agree. 
But the reason the callback cannot proceed is the log, and the implicit lock it 
needs on the object. Actually it tries to log here: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L247
I think for whatever reason the synch block is needed (correct or wrong) a new 
functionality or a refactoring should not break it.

And even after this is fixed (the potential use of synch block) i wanted to 
point out the tricky thing with lazy vals, there was another issue as i said in 
the past.
In other words i dont think using the lazy val there is the best way since it 
creates hidden issues. Thats what i am trying to say.

[~mgummelt] what do you think? Can you provide some context for the 
synchronization there? 


was (Author: skonto):
Usually you should not lock on the object for synchronization that way i agree. 
But the reason the callback cannot proceed is the log, and the implicit lock it 
needs on the object. Actually it tries to log here: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L247
I think for whatever reason the synch block is needed (correct or wrong) a new 
functionality or a refactoring should not break it.

And even after this is fixed i wanted to point out the tricky thing with lazy 
vals, there was another issue as i said in the past.
In other words i dont think using the lazy val there is the best way since it 
creates hidden issues. Thats what i am trying to say.

[~mgummelt] what do you think? Can you provide some context for the 
synchronization there? 

> Spark on mesos is broken due to race condition in Logging
> ---------------------------------------------------------
>
>                 Key: SPARK-16379
>                 URL: https://issues.apache.org/jira/browse/SPARK-16379
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.0
>            Reporter: Stavros Kontopoulos
>            Priority: Blocker
>         Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_0000000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

Reply via email to