RE: Spark Optimization

2018-04-26 Thread Pallavi Singh
Thanks for your reply. It is 64GB per node. We will try using UseParallelGC. From: CPC [mailto:acha...@gmail.com] Sent: Thursday, April 26, 2018 11:44 PM To: vincent gromakowski Cc: Pallavi Singh ; user Subject: Re: Spark Optimization I would recommend UseParallelGC since this is a batch job

Spark Optimization

2018-04-26 Thread Pallavi Singh
Hi Team, We are currently working on POC based on Spark and Scala. we have to read 18million records from parquet file and perform the 25 user defined aggregation based on grouping keys. we have used spark high level Dataframe API for the aggregation. On cluster of two node we could finish end t