Re: Ideal core count within a single JVM

Matei Zaharia Mon, 14 Jul 2014 12:37:55 -0700

Are you increasing the number of parallel tasks with cores as well? With more 
tasks there will be more data communicated and hence more calls to these 
functions.

Unfortunately contention is kind of hard to measure, since often the result is 
that you see many cores idle as they're waiting on a lock. ObjectOutputStream 
should not lock anything, but if it's blocking on a FileOutputStream to write 
data, that could be a problem. Look for "BLOCKED" threads in a stack trace too 
(do jstack on your Java process and look at the TaskRunner threads).

Incidentally you can probably speed this up by using Kryo serialization instead 
of Java (see http://spark.apache.org/docs/latest/tuning.html). That might make 
it less CPU-bound and it would also create less IO.

Matei

On Jul 14, 2014, at 12:23 PM, lokesh.gidra <lokesh.gi...@gmail.com> wrote:

> Thanks a lot for replying back.
> 
> Actually, I am running the SparkPageRank example with 160GB heap (I am sure
> the problem is not GC because the excess time is being spent in java code
> only).
> 
> What I have observed in Jprofiler and Oprofile outputs is that the amount of
> time spent in following 2 functions increases substantially with increasing
> N:
> 
> 1) java.io.ObjectOutputStream.writeObject0
> 2) scala.Tuple2.hashCode 
> 
> I don't think that Linux file system could be causing the issue as my
> machine has 256GB RAM, and I am using a tmpfs for java.io.tmpdir. So, I
> don't think there is much disk access involved, if that is what you meant.
> 
> Regards,
> Lokesh
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566p9630.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Ideal core count within a single JVM

Reply via email to