Re: Logistic Regression Iterations causing High GC in Spark 2.3

Dhrubajyoti Hati Mon, 29 Jul 2019 07:04:18 -0700

Hi Sean,

Yeah I checked the heap, its almost full. I checked the GC logs in the
executors where I found that GC cycles are kicking in frequently. The
Executors tab shows red in the "Total Time/GC Time".


Also the data which I am dealing with is quite small(~4 GB) and the cluster
is quite big for that high GC.

But what's troubling me is this issue doesn't occur in Spark 2.2 at all.
What could be the reason behind such a behaviour?

Regards,
Dhrub

On Mon, Jul 29, 2019 at 6:45 PM Sean Owen <sro...@gmail.com> wrote:

> -dev@
>
> Yep, high GC activity means '(almost) out of memory'. I don't see that
> you've checked heap usage - is it nearly full?
> The answer isn't tuning but more heap.
> (Sometimes with really big heaps the problem is big pauses, but that's
> not the case here.)
>
> On Mon, Jul 29, 2019 at 1:26 AM Dhrubajyoti Hati <dhruba.w...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > We were running Logistic Regression in Spark 2.2.X and then we tried to
> see how does it do in Spark 2.3.X. Now we are facing an issue while running
> a Logistic Regression Model in Spark 2.3.X on top of Yarn(GCP-Dataproc). In
> the TreeAggregate method it takes a huge time due to very High GC Activity.
> I have tuned the GC, created different sized clusters, higher spark
> version(2.4.X), smaller data but nothing helps. The GC time is 100 - 1000
> times of the processing time in avg for iterations.
> >
> > The strange part is in Spark 2.2 this doesn't happen at all. Same code,
> same cluster sizing, same data in both the cases.
> >
> > I was wondering if someone can explain this behaviour and help me to
> resolve this. How can the same code has so different behaviour in two Spark
> version, especially the higher ones?
> >
> > Here are the config which I used:
> >
> >
> > spark.serializer=org.apache.spark.serializer.KryoSerializer
> >
> > #GC Tuning
> >
> > spark.executor.extraJavaOptions= -XX:+UseG1GC -XX:+PrintFlagsFinal
> -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy
> -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -Xms9000m
> -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5
> >
> >
> > spark.executor.instances=20
> >
> > spark.executor.cores=1
> >
> > spark.executor.memory=9010m
> >
> >
> >
> > Regards,
> > Dhrub
> >
>

Re: Logistic Regression Iterations causing High GC in Spark 2.3

Reply via email to