Re: Anomalous Spark RDD persistence behavior

2016-11-08 Thread Dave Jaffe
No, I am not using serializing either with memory or disk. Dave Jaffe VMware dja...@vmware.com From: Shreya Agarwal Date: Monday, November 7, 2016 at 3:29 PM To: Dave Jaffe , "user@spark.apache.org" Subject: RE: Anomalous Spark RDD persistence behavior I don’t think this is corre

RE: Anomalous Spark RDD persistence behavior

2016-11-07 Thread Shreya Agarwal
and move the old one out to disk. But it is not able to move the old one out fast enough and crashes with OOM. Anyone seeing that? From: Dave Jaffe [mailto:dja...@vmware.com] Sent: Monday, November 7, 2016 2:07 PM To: user@spark.apache.org Subject: Anomalous Spark RDD persistence behavior I’ve

Anomalous Spark RDD persistence behavior

2016-11-07 Thread Dave Jaffe
I’ve been studying Spark RDD persistence with spark-perf (https://github.com/databricks/spark-perf), especially when the dataset size starts to exceed available memory. I’m running Spark 1.6.0 on YARN with CDH 5.7. I have 10 NodeManager nodes, each with 16 vcores and 32 GB of container memory.