I am doing a short Proof of Concept for using Flink and Kafka in our product.  
On my laptop I can process 10M inputs in about 90 min.  On 2 different EC2 
instances (m4.xlarge and m5.xlarge both 4core 16GB ram and ssd storage) I see 
the process hit a wall around 50min into the test and short of 7M events 
processed.  This is running zookeeper, kafka broker, flink all on the same 
server in all cases.  My goal is to measure single node vs. multi-node and test 
horizontal scalability, but I would like to figure out why hit hits a wall 
first.  I have the task maanger configured with 6 slots and the job has 5 
parallelism.  The laptop has 8 threads, and the EC2 instances have 4 threads. 
On smaller data sets and in the begining of each test the EC2 instances outpace 
the laptop.  I will try again with an m5.2xlarge which has 8 threads and 32GB 
ram to see if that works better for this workload.  Any pointers or ways to get 
metrics that would help diagnose this would be appreciated.

Michael

Reply via email to