Re: Need help regarding Flink Batch Application

Chesnay Schepler Wed, 08 Aug 2018 01:27:01 -0700

What have you tried so far to increase performance? (Did you trydifferent combinations of -yn and -ys?)


Can you provide us with your application? What source/sink are you using?


On 08.08.2018 07:59, Ravi Bhushan Ratnakar wrote:

Hi Everybody,
Currently I am working on a project where i need to write a FlinkBatch Application which has to process hourly data around 400GB ofcompressed sequence file. After processing, it has write it ascompressed parquet format in S3.
I have managed to write the application in Flink and able to runsuccessfully process the whole hour data and write in Parquet formatin S3. But the problem is this that it is not able to meet theperformance of the existing application which is written using SparkBatch(running in production).
Current Spark Batch
Cluster size - Aws EMR - 1 Master + 100 worker node of m4.4xlarge (16vCpu, 64GB RAM), each instance with 160GB disk volume
Input data - Around 400GB
Time Taken to process - Around 36 mins

------------------------------------------------------------

Flink Batch
Cluster size - Aws EMR - 1 Master + 100 worker node of r4.4xlarge (16vCpu, 64GB RAM), each instance with 630GB disk volumeTransient Job - flink run -m yarn-cluster -yn 792 -ys 2 -ytm 14000-yjm 114736
Input data - Around 400GB
Time Taken to process - Around 1 hour
I have given all the node memory to jobmanager just to make sure thatthere is a dedicated node for jobmanager so that it doesn't face anyissue related to resources.
We are already running Flink Batch job with double RAM compare toSpark Batch however we are not able get the same performance.
Kindly suggest on this to achieve the same performance as we aregetting from Spark Batch
Thanks,
Ravi

Re: Need help regarding Flink Batch Application

Reply via email to