Actually the original data is around ~120 GB. If we provide higher memory
then we might require an even bigger cluster to finish training the whole
model within planned time. And this will affect the cost of operations.
Please correct me if I am wrong here.
Nevertheless, can you point out how much
Could be lots of things. Implementations change, caching may have
changed, etc. The size of the input doesn't really directly translate
to heap usage. Here you just need a bit more memory.
On Mon, Jul 29, 2019 at 9:03 AM Dhrubajyoti Hati wrote:
>
> Hi Sean,
>
> Yeah I checked the heap, its almost
Hi Sean,
Yeah I checked the heap, its almost full. I checked the GC logs in the
executors where I found that GC cycles are kicking in frequently. The
Executors tab shows red in the "Total Time/GC Time".
Also the data which I am dealing with is quite small(~4 GB) and the cluster
is quite big for t
-dev@
Yep, high GC activity means '(almost) out of memory'. I don't see that
you've checked heap usage - is it nearly full?
The answer isn't tuning but more heap.
(Sometimes with really big heaps the problem is big pauses, but that's
not the case here.)
On Mon, Jul 29, 2019 at 1:26 AM Dhrubajyoti
Actually I didn't have any of the GC tuning in the beginning and then
adding them also didn't made any difference. As mentioned earlier I tried
low number executors of higher configuration and vice versa. Nothing helps.
About the code its simple logistic regression nothing with explicit
broadcast o
I would remove the all GC tuning and add it later once you found the underlying
root cause. Usually more GC means you need to provide more memory, because
something has changed (your application, spark Version etc.)
We don’t have your full code to give exact advise, but you may want to rethink
Hi,
We were running Logistic Regression in Spark 2.2.X and then we tried to see
how does it do in Spark 2.3.X. Now we are facing an issue while running a
Logistic Regression Model in Spark 2.3.X on top of Yarn(GCP-Dataproc). In
the TreeAggregate method it takes a huge time due to very High GC Acti