Hi Guillaume,
Interesting that you brought up Shuffle. In fact we are experiencing this
issue of shuffle files being left behind and not being cleaned up. Since
this is a Spark streaming application, it is expected to stay up
indefinitely, so shuffle files being left is a big problem right now. Si
Thanks TD. I believe that might have been the issue. Will try for a few
days after passing in the GC option on the java command line when we start
the process.
Thanks for your timely help.
NB
On Wed, Apr 8, 2015 at 6:08 PM, Tathagata Das wrote:
> Yes, in local mode they the driver and executor
Yes, in local mode they the driver and executor will be same the process.
And in that case the Java options in SparkConf configuration will not
work.
On Wed, Apr 8, 2015 at 1:44 PM, N B wrote:
> Since we are running in local mode, won't all the executors be in the same
> JVM as the driver?
>
>
Since we are running in local mode, won't all the executors be in the same
JVM as the driver?
Thanks
NB
On Wed, Apr 8, 2015 at 1:29 PM, Tathagata Das wrote:
> Its does take effect on the executors, not on the driver. Which is okay
> because executors have all the data and therefore have GC issu
Its does take effect on the executors, not on the driver. Which is okay
because executors have all the data and therefore have GC issues, not so
usually for the driver. If you want to double-sure, print the JVM flag
(e.g. http://stackoverflow.com/questions/10486375/print-all-jvm-flags)
However, th
Hi TD,
Thanks for the response. Since you mentioned GC, this got me thinking.
Given that we are running in local mode (all in a single JVM) for now, does
the option "spark.executor.extraJavaOptions" set to
"-XX:+UseConcMarkSweepGC" inside SparkConf object take effect at all before
we use it to cr
There are a couple of options. Increase timeout (see Spark configuration).
Also see past mails in the mailing list.
Another option you may try (I have gut feeling that may work, but I am not
sure) is calling GC on the driver periodically. The cleaning up of stuff is
tied to GCing of RDD objects a