scaling flink

Bill Sparks Fri, 05 Jun 2015 08:16:51 -0700

Hi.

I'm running some comparisons between flink, MRv2, and spark(1.3), using the new 
Intel HiBench suite. I've started with the stock workcount example and I'm 
seeing some numbers which are not where I thought I'd be.


So the question I have is what the the configuration parameters which can 
affect the performance? Is there a performance/tuning guide.

What we have – hardware wise are 48 Haswell/32 physical/64 HT cores with 128 
GB, FDR connect nodes. I'm parsing 2TB of text, using the following parameters.

./bin/flink run -m yarn-cluster \
-yD fs.overwrite-files=true \
-yD fs.output.always-create-directory=true \
-yq \
-yn $((666)) \
-yD taskmanager.numberOfTaskSlots=$((1)) \
-yD parallelization.degree.default=$((666)) \
-ytm $((4*1024)) \
-yjm $((4*1024)) \
./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar \
hdfs:///user/jsparks/HiBench/Wordcount/Input \
hdfs:///user/jsparks/HiBench/Wordcount/Output

Any pointers would be greatly appreciated.


Type                Date       Time     Input_data_size      Duration(s)        
  Throughput(bytes/s)  Throughput/node
HadoopWordcount     2015-06-03 10:45:11 2052360935068        763.106            
  2689483420           2689483420
JavaSparkWordcount  2015-06-03 10:55:24 2052360935068        411.246            
  4990591847           4990591847
ScalaSparkWordcount 2015-06-03 11:06:24 2052360935068        342.777            
  5987452294           5987452294

Type                Date       Time     Input_data_size      Duration(s)        
  Throughput(bytes/s)  Throughput/node
flinkWordCount      2015-06-04 16:27:27 2052360935068        647.383            
  3170242244           66046713


--
Jonathan (Bill) Sparks
Software Architecture
Cray Inc.

scaling flink

Reply via email to