Hang on Executor classloader lookup for the remote REPL URL classloader

2014-08-20 Thread Andrew Ash
Hi Spark devs, I'm seeing a stacktrace where the classloader that reads from the REPL is hung, and blocking all progress on that executor. Below is that hung thread's stacktrace, and also the stacktrace of another hung thread. I thought maybe there was an issue with the REPL's JVM on the other s

?????? is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-20 Thread witgo
There's a related discussion https://issues.apache.org/jira/browse/SPARK-2815 -- -- ??: "Chester Chen"; : 2014??8??21??(??) 7:42 ??: "dev"; : Re: is Branch-1.1 SBT build broken for yarn-alpha ? Just tried on master branc

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-20 Thread Chester Chen
Just tried on master branch, and the master branch works fine for yarn-alpha On Wed, Aug 20, 2014 at 4:39 PM, Chester Chen wrote: > I just updated today's build and tried branch-1.1 for both yarn and > yarn-alpha. > > For yarn build, this command seem to work fine. > > sbt/sbt -Pyarn -Dhadoop.v

is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-20 Thread Chester Chen
I just updated today's build and tried branch-1.1 for both yarn and yarn-alpha. For yarn build, this command seem to work fine. sbt/sbt -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 projects for yarn-alpha sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects I got the following Any ideas Che

Re: Akka usage in Spark

2014-08-20 Thread Debasish Das
Yeah that's the one we discussed...sorry I pointed to a different one that I was reading... On Wed, Aug 20, 2014 at 3:28 PM, DB Tsai wrote: > To be specific, I was discussing this PR with Debasish which reduces > lots of issues when sending big objects to executors without using > broadcast exp

Re: Akka usage in Spark

2014-08-20 Thread DB Tsai
To be specific, I was discussing this PR with Debasish which reduces lots of issues when sending big objects to executors without using broadcast explicitly. Broadcast RDD object once per TaskSet (instead of sending it for every task) https://issues.apache.org/jira/browse/SPARK-2521 Sincerely, D

Re: Akka usage in Spark

2014-08-20 Thread Debasish Das
Hi Patrick, Last few days I came across some bugs which got exposed due to ALS runs on large scale data...although it was not related to the akka changes but during the debug I found across some akka related changes that might have an impact of overall performance...one example is the following:

Re: Limit on number of simultaneous Spark frameworks on Mesos?

2014-08-20 Thread Cody Koeninger
obs are hung, I see the following in mesos master logs: > > > > I0820 19:28:02.651296 24666 master.cpp:2282] Sending 7 offers to > framework 20140820-170154-1315739402-5050-24660-0020 > > I0820 19:28:02.654502 24668 master.cpp:1578] Processing reply for > offers: [ 20140820-

Re: Akka usage in Spark

2014-08-20 Thread Patrick Wendell
Hey Deb, Can you be specific what changes you are mentioning? We have not, to my knowledge, made major architectural changes around akka use. I think in general we don't want people to be using Spark's actor system directly - it is an internal communication component in Spark and could e.g. be re

Limit on number of simultaneous Spark frameworks on Mesos?

2014-08-20 Thread Cody Koeninger
ivial (e.g. parallelize 1 to 1 and sum). Killing one of the jobs typically allows the others to start proceeding. While jobs are hung, I see the following in mesos master logs: I0820 19:28:02.651296 24666 master.cpp:2282] Sending 7 offers to framework 20140820-170154-1315739402-5050-24660-0020

Akka usage in Spark

2014-08-20 Thread Debasish Das
Hi, There have been some recent changes in the way akka is used in spark and I feel they are major changes... Is there a design document / JIRA / experiment on large datasets that highlight the impact of changes (1.0 vs 1.1) ? Basically it will be great to understand where akka is used in the cod

Re: Lost executor on YARN ALS iterations

2014-08-20 Thread Sandy Ryza
Hi Debasish, The fix is to raise spark.yarn.executor.memoryOverhead until this goes away. This controls the buffer between the JVM heap size and the amount of memory requested from YARN (JVMs can take up memory beyond their heap size). You should also make sure that, in the YARN NodeManager confi

Re: Lost executor on YARN ALS iterations

2014-08-20 Thread Debasish Das
I could reproduce the issue in both 1.0 and 1.1 using YARN...so this is definitely a YARN related problem... At least for me right now only deployment option possible is standalone... On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng wrote: > Hi Deb, > > I think this may be the same issue as de