You can also set these in the spark-env.sh file :
export SPARK_WORKER_DIR="/mnt/spark/"
export SPARK_LOCAL_DIR="/mnt/spark/"
Thanks
Best Regards
On Mon, Jul 6, 2015 at 12:29 PM, Akhil Das
wrote:
> While the job is running, just look in the directory and see whats the
> root cause of it (is i
While the job is running, just look in the directory and see whats the root
cause of it (is it the logs? is it the shuffle? etc). Here's a few
configuration options which you can try:
- Disable shuffle : spark.shuffle.spill=false (It might end up in OOM)
- Enable log rotation:
sparkConf.set("spar
Hi ,
I am trying to run an ETL on spark which involves expensive shuffle
operation. Basically I require a self-join to be performed on a
sparkDataFrame RDD . The job runs fine for around 15 hours and when the
stage(which performs the sef-join) is about to complete, I get a
*"java.io.IOException: