Re: Ideal core count within a single JVM

2014-07-15 Thread lokesh.gidra
It makes sense what you said. But, when I proportionately reduce the heap size, then also the problem persists. For instance, if I use 160 GB heap for 48 cores, whereas 80 GB heap for 24 cores, then also with 24 cores the performance is better (although worse than 160 GB with 24 cores) than 48-core

Re: Ideal core count within a single JVM

2014-07-14 Thread Matei Zaharia
BTW you can see the number of parallel tasks in the application UI (http://localhost:4040) or in the log messages (e.g. when it says progress: 17/20, that means there are 20 tasks total and 17 are done). Spark will try to use at least one task per core in local mode so there might be more of the

Re: Ideal core count within a single JVM

2014-07-14 Thread Matei Zaharia
I see, so here might be the problem. With more cores, there's less memory available per core, and now many of your threads are doing external hashing (spilling data to disk), as evidenced by the calls to ExternalAppendOnlyMap.spill. Maybe with 10 threads, there was enough memory per task to do

Re: Ideal core count within a single JVM

2014-07-14 Thread lokesh.gidra
I am only playing with 'N' in local[N]. I thought that by increasing N, Spark will automatically use more parallel tasks. Isn't it so? Can you please tell me how can I modify the number of parallel tasks? For me, there are hardly any threads in BLOCKED state in jstack output. In 'top' I see my app

Re: Ideal core count within a single JVM

2014-07-14 Thread Matei Zaharia
Are you increasing the number of parallel tasks with cores as well? With more tasks there will be more data communicated and hence more calls to these functions. Unfortunately contention is kind of hard to measure, since often the result is that you see many cores idle as they're waiting on a l

Re: Ideal core count within a single JVM

2014-07-14 Thread lokesh.gidra
Thanks a lot for replying back. Actually, I am running the SparkPageRank example with 160GB heap (I am sure the problem is not GC because the excess time is being spent in java code only). What I have observed in Jprofiler and Oprofile outputs is that the amount of time spent in following 2 funct

Re: Ideal core count within a single JVM

2014-07-14 Thread Matei Zaharia
Probably something like 8 is best on this kind of machine. What operations are you doing though? It's possible that something else is a contention point at 48 threads, e.g. a common one we've seen is the Linux file system. Matei On Jul 13, 2014, at 4:03 PM, lokesh.gidra wrote: > Hello, > > W