Hi.

I'm running some comparisons between flink, MRv2, and spark(1.3), using the new 
Intel HiBench suite. I've started with the stock workcount example and I'm 
seeing some numbers which are not where I thought I'd be.

So the question I have is what the the configuration parameters which can 
affect the performance? Is there a performance/tuning guide.

What we have – hardware wise are 48 Haswell/32 physical/64 HT cores with 128 
GB, FDR connect nodes. I'm parsing 2TB of text, using the following parameters.

./bin/flink run -m yarn-cluster \
-yD fs.overwrite-files=true \
-yD fs.output.always-create-directory=true \
-yq \
-yn $((666)) \
-yD taskmanager.numberOfTaskSlots=$((1)) \
-yD parallelization.degree.default=$((666)) \
-ytm $((4*1024)) \
-yjm $((4*1024)) \
./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar \
hdfs:///user/jsparks/HiBench/Wordcount/Input \
hdfs:///user/jsparks/HiBench/Wordcount/Output

Any pointers would be greatly appreciated.


Type                Date       Time     Input_data_size      Duration(s)        
  Throughput(bytes/s)  Throughput/node
HadoopWordcount     2015-06-03 10:45:11 2052360935068        763.106            
  2689483420           2689483420
JavaSparkWordcount  2015-06-03 10:55:24 2052360935068        411.246            
  4990591847           4990591847
ScalaSparkWordcount 2015-06-03 11:06:24 2052360935068        342.777            
  5987452294           5987452294

Type                Date       Time     Input_data_size      Duration(s)        
  Throughput(bytes/s)  Throughput/node
flinkWordCount      2015-06-04 16:27:27 2052360935068        647.383            
  3170242244           66046713


--
Jonathan (Bill) Sparks
Software Architecture
Cray Inc.

Reply via email to