Are you increasing the number of parallel tasks with cores as well? With more tasks there will be more data communicated and hence more calls to these functions.
Unfortunately contention is kind of hard to measure, since often the result is that you see many cores idle as they're waiting on a lock. ObjectOutputStream should not lock anything, but if it's blocking on a FileOutputStream to write data, that could be a problem. Look for "BLOCKED" threads in a stack trace too (do jstack on your Java process and look at the TaskRunner threads). Incidentally you can probably speed this up by using Kryo serialization instead of Java (see http://spark.apache.org/docs/latest/tuning.html). That might make it less CPU-bound and it would also create less IO. Matei On Jul 14, 2014, at 12:23 PM, lokesh.gidra <lokesh.gi...@gmail.com> wrote: > Thanks a lot for replying back. > > Actually, I am running the SparkPageRank example with 160GB heap (I am sure > the problem is not GC because the excess time is being spent in java code > only). > > What I have observed in Jprofiler and Oprofile outputs is that the amount of > time spent in following 2 functions increases substantially with increasing > N: > > 1) java.io.ObjectOutputStream.writeObject0 > 2) scala.Tuple2.hashCode > > I don't think that Linux file system could be causing the issue as my > machine has 256GB RAM, and I am using a tmpfs for java.io.tmpdir. So, I > don't think there is much disk access involved, if that is what you meant. > > Regards, > Lokesh > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Ideal-core-count-within-a-single-JVM-tp9566p9630.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.