Thanks for your reply.
It is 64GB per node. We will try using UseParallelGC.
From: CPC [mailto:acha...@gmail.com]
Sent: Thursday, April 26, 2018 11:44 PM
To: vincent gromakowski
Cc: Pallavi Singh ; user
Subject: Re: Spark Optimization
I would recommend UseParallelGC since this is a batch job
Hi Team,
We are currently working on POC based on Spark and Scala.
we have to read 18million records from parquet file and perform the 25 user
defined aggregation based on grouping keys.
we have used spark high level Dataframe API for the aggregation. On cluster of
two node we could finish end t