Re: Flink/Kafka POC performance issue

TechnoMage Tue, 17 Apr 2018 08:50:59 -0700

Memory use is steady throughout the job, but the CPU utilization drops off a 
cliff.  I assume this is because it becomes I/O bound shuffling managed state.


Are there any metrics on managed state that can help in evaluating what to do 
next?

Michael

> On Apr 17, 2018, at 7:11 AM, Michael Latta <mla...@technomage.com> wrote:
> 
> Thanks for the suggestion. The task manager is configured for 8GB of heap, 
> and gets to about 8.3 total. Other java processes (job manager and Kafka). 
> Add a few more. I will check it again but the instances have 16GB same as my 
> laptop that completes the test in <90 min. 
> 
> Michael
> 
> Sent from my iPad
> 
> On Apr 16, 2018, at 10:53 PM, Niclas Hedhman <nic...@hedhman.org 
> <mailto:nic...@hedhman.org>> wrote:
> 
>> 
>> Have you checked memory usage? It could be as simple as either having memory 
>> leaks, or aggregating more than you think (sometimes not obvious how much is 
>> kept around in memory for longer than one first thinks). If possible, 
>> connect FlightRecorder or similar tool and keep an eye on memory. 
>> Additionally, I don't have AWS experience to talk of, but IF AWS swaps RAM 
>> to disk like regular Linux, then that might be triggered if your JVM heap is 
>> bigger than can be handled within the available RAM.
>> 
>> On Tue, Apr 17, 2018 at 9:26 AM, TechnoMage <mla...@technomage.com 
>> <mailto:mla...@technomage.com>> wrote:
>> I am doing a short Proof of Concept for using Flink and Kafka in our 
>> product.  On my laptop I can process 10M inputs in about 90 min.  On 2 
>> different EC2 instances (m4.xlarge and m5.xlarge both 4core 16GB ram and ssd 
>> storage) I see the process hit a wall around 50min into the test and short 
>> of 7M events processed.  This is running zookeeper, kafka broker, flink all 
>> on the same server in all cases.  My goal is to measure single node vs. 
>> multi-node and test horizontal scalability, but I would like to figure out 
>> why hit hits a wall first.  I have the task maanger configured with 6 slots 
>> and the job has 5 parallelism.  The laptop has 8 threads, and the EC2 
>> instances have 4 threads. On smaller data sets and in the begining of each 
>> test the EC2 instances outpace the laptop.  I will try again with an 
>> m5.2xlarge which has 8 threads and 32GB ram to see if that works better for 
>> this workload.  Any pointers or ways to get metrics that would help diagnose 
>> this would be appreciated.
>> 
>> Michael
>> 
>> 
>> 
>> 
>> -- 
>> Niclas Hedhman, Software Developer
>> http://polygene.apache.org <http://polygene.apache.org/> - New Energy for 
>> Java

Re: Flink/Kafka POC performance issue

Reply via email to