On Thu, Oct 31, 2019 at 10:04 PM Nicolas Paris <[email protected]>
wrote:
> have you deactivated the spark.ui ?
> I have read several thread explaining the ui can lead to OOM because it
> stores 1000 dags by default
>
>
> On Sun, Oct 20, 2019 at 03:18:20AM -0700, Paul Wais wrote:
> > Dear List,
> >
> > I've observed some sort of memory leak when using pyspark to run ~100
> > jobs in local mode. Each job is essentially a create RDD -> create DF
> > -> write DF sort of flow. The RDD and DFs go out of scope after each
> > job completes, hence I call this issue a "memory leak." Here's
> > pseudocode:
> >
> > ```
> > row_rdds = []
> > for i in range(100):
> > row_rdd = spark.sparkContext.parallelize([{'a': i} for i in
> range(1000)])
> > row_rdds.append(row_rdd)
> >
> > for row_rdd in row_rdds:
> > df = spark.createDataFrame(row_rdd)
> > df.persist()
> > print(df.count())
> > df.write.save(...) # Save parquet
> > df.unpersist()
> >
> > # Does not help:
> > # del df
> > # del row_rdd
> > ```
The connection between Python GC/del and JVM GC is perhaps a bit weaker
than we might like. There certainly could be a problem here, but it still
shouldn’t be getting to the OOM state.
>
> >
> > In my real application:
> > * rows are much larger, perhaps 1MB each
> > * row_rdds are sized to fit available RAM
> >
> > I observe that after 100 or so iterations of the second loop (each of
> > which creates a "job" in the Spark WebUI), the following happens:
> > * pyspark workers have fairly stable resident and virtual RAM usage
> > * java process eventually approaches resident RAM cap (8GB standard)
> > but virtual RAM usage keeps ballooning.
> >
Can you share what flags the JVM is launching with? Also which JVM(s) are
ballooning?
>
> > Eventually the machine runs out of RAM and the linux OOM killer kills
> > the java process, resulting in an "IndexError: pop from an empty
> > deque" error from py4j/java_gateway.py .
> >
> >
> > Does anybody have any ideas about what's going on? Note that this is
> > local mode. I have personally run standalone masters and submitted a
> > ton of jobs and never seen something like this over time. Those were
> > very different jobs, but perhaps this issue is bespoke to local mode?
> >
> > Emphasis: I did try to del the pyspark objects and run python GC.
> > That didn't help at all.
> >
> > pyspark 2.4.4 on java 1.8 on ubuntu bionic (tensorflow docker image)
> >
> > 12-core i7 with 16GB of ram and 22GB swap file (swap is *on*).
> >
> > Cheers,
> > -Paul
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: [email protected]
> >
>
> --
> nicolas
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau