hey matei,
ok when i switch to java 7 with G1 the GC time for all the "quick" tasks
goes from 150ms to 10ms, but the slow ones stay just as slow. all i did was
add -XX:+UseG1GC so maybe thats wrong, i still have to read up on G1.
an example of GC in a slow task is below.
best, koert
[GC pause (y
Yeah, System.gc() is a suggestion but in practice it does invoke full GCs on
the Sun JVM.
Matei
On Mar 11, 2014, at 12:35 PM, Koert Kuipers wrote:
> hey matei,
> ha i will definitely that one! looks like a total hack... i might just
> schedule it after the precaching of rdds defensively.
>
>
hey matei,
ha i will definitely that one! looks like a total hack... i might just
schedule it after the precaching of rdds defensively.
also trying java 7 with g1
On Tue, Mar 11, 2014 at 3:17 PM, Matei Zaharia wrote:
> Right, that's it. I think what happened is the following: all the nodes
> ge
Note that calling System.gc() is just a suggestion to the JVM that it
should run a garbage collection and doesn't force it right then 100% of the
time.
http://stackoverflow.com/questions/1481178/forcing-garbage-collection-in-java
On Tue, Mar 11, 2014 at 12:17 PM, Matei Zaharia wrote:
> Right, t
Right, that’s it. I think what happened is the following: all the nodes
generated some garbage that put them very close to the threshold for a full GC
in the first few runs of the program (when you cached the RDDs), but on the
subsequent queries, only a few nodes are hitting full GC per query, s
hey matei,
most tasks have GC times of 200ms or less, and then a few tasks take many
seconds. example GC activity for a slow one:
[GC [PSYoungGen: 1051814K->262624K(1398144K)] 3789259K->3524429K(5592448K),
0.0986800 secs] [Times: user=1.53 sys=0.01, real=0.10 secs]
[GC [PSYoungGen: 786935K->524512
hey matei,
it happens repeatedly.
we are currently runnning on java 6 with spark 0.9.
i will add -XX:+PrintGCDetails and collect details, and also look into java
7 G1. thanks
On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia wrote:
> Does this happen repeatedly if you keep running the computa
Does this happen repeatedly if you keep running the computation, or just the
first time? It may take time to move these Java objects to the old generation
the first time you run queries, which could lead to a GC pause that also slows
down the small queries.
If you can run with -XX:+PrintGCDetai
hello all,
i am observing a strange result. i have a computation that i run on a
cached RDD in spark-standalone. it typically takes about 4 seconds.
but when other RDDs that are not relevant to the computation at hand are
cached in memory (in same spark context), the computation takes 40 seconds
o