Executor and BlockManager memory size

2014-10-09 Thread Larry Xiao
Hi all, I'm confused about Executor and BlockManager, why they have different memory. 14/10/10 08:50:02 INFO AppClient$ClientActor: Executor added: app-20141010085001-/2 on worker-20141010004933-brick6-35657 (brick6:35657) with 6 cores 14/10/10 08:50:02 INFO SparkDeploySchedulerBackend: Gr

PageRank execution imbalance, might hurt performance by 6x

2014-09-27 Thread Larry Xiao
Hi all! I'm running PageRank on GraphX, and I find on some tasks on one machine can spend 5~6 times more time than on others, others are perfectly balance (around 1 second to finish). And since time for a stage (iteration) is determined by the slowest task, the performance is undesirable. I

VertexRDD partition imbalance

2014-09-25 Thread Larry Xiao
Hi all VertexRDD is partitioned with HashPartitioner, and it exhibits some imbalance of tasks. For example, Connected Components with partition strategy Edge2D: Aggregated Metrics by Executor Executor ID Task Time Total Tasks Failed Tasks Succeeded Tasks Input Shuffle Read Shuf

Re: Specifying Spark Executor Java options using Spark Submit

2014-09-24 Thread Larry Xiao
Hi Arun! I think you can find info at https://spark.apache.org/docs/latest/configuration.html quote: Spark provides three locations to configure the system: * Spark properties control most application parame

Re: Compiling Spark master (6ba6c3eb) with sbt/sbt assembly

2014-08-04 Thread Larry Xiao
Sorry I mean, I tried this command ./sbt/sbt clean and now it works. Is it because of cached components no recompiled? On 8/4/14, 4:44 PM, Larry Xiao wrote: I guessed ./sbt/sbt clean and it works fine now. On 8/4/14, 11:48 AM, Larry Xiao wrote: On the latest pull today

Re: Compiling Spark master (6ba6c3eb) with sbt/sbt assembly

2014-08-04 Thread Larry Xiao
I guessed ./sbt/sbt clean and it works fine now. On 8/4/14, 11:48 AM, Larry Xiao wrote: On the latest pull today (6ba6c3ebfe9a47351a50e45271e241140b09bf10) meet assembly problem. $ ./sbt/sbt assembly Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME. Note, this will be overridden by

Re: Timing the codes in GraphX

2014-08-03 Thread Larry Xiao
Hi Deep, I think you can refer to GraphLoader.scala. use Logging val startTime = System.currentTimeMillis logInfo("It took %d ms to load the edges".format(System.currentTimeMillis - startTime)) Larry On 8/4/14, 12:37 PM, Deep Pradhan wrote: Is there any way to time the execution of GraphX

Re: Ports required for running spark

2014-07-31 Thread Larry Xiao
rry On 7/31/14, 6:17 PM, Konstantin Kudryavtsev wrote: Hi Larry, I'm afraid this is standalone mode, I'm interesting in YARN Also, I don't see port-in-trouble 33007which i believe related to Akka Thank you, Konstantin Kudryavtsev On Thu, Jul 31, 2014 at 1:11 PM, Larry Xiao <mailto:x

Re: Ports required for running spark

2014-07-31 Thread Larry Xiao
Hi Konstantin, I think you can find it at https://spark.apache.org/docs/latest/spark-standalone.html#configuring-ports-for-network-security and you can specify port for master or worker at conf/spark-env.sh Larry On 7/31/14, 6:04 PM, Konstantin Kudryavtsev wrote: Hi there, I'm trying to run

Re: Index calculation will cause integer overflow of numPartitions > 10362 in sortByKey

2014-07-30 Thread Larry Xiao
Hi Can you assign it to me? Thanks Larry On 7/31/14, 10:47 AM, Jianshi Huang wrote: I created this JIRA issue, somebody please pick it up? https://issues.apache.org/jira/browse/SPARK-2728 -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/

Re: spark.shuffle.consolidateFiles seems not working

2014-07-30 Thread Larry Xiao
Hi Jianshi, I've met similar situation before. And my solution was 'ulimit', you can use -a to see your current settings -n to set open files limit (and other limits also) And I set -n to 10240. I see spark.shuffle.consolidateFiles helps by reusing open files. (so I don't know to what extend d