Memory use is steady throughout the job, but the CPU utilization drops off a cliff. I assume this is because it becomes I/O bound shuffling managed state.
Are there any metrics on managed state that can help in evaluating what to do next? Michael > On Apr 17, 2018, at 7:11 AM, Michael Latta <mla...@technomage.com> wrote: > > Thanks for the suggestion. The task manager is configured for 8GB of heap, > and gets to about 8.3 total. Other java processes (job manager and Kafka). > Add a few more. I will check it again but the instances have 16GB same as my > laptop that completes the test in <90 min. > > Michael > > Sent from my iPad > > On Apr 16, 2018, at 10:53 PM, Niclas Hedhman <nic...@hedhman.org > <mailto:nic...@hedhman.org>> wrote: > >> >> Have you checked memory usage? It could be as simple as either having memory >> leaks, or aggregating more than you think (sometimes not obvious how much is >> kept around in memory for longer than one first thinks). If possible, >> connect FlightRecorder or similar tool and keep an eye on memory. >> Additionally, I don't have AWS experience to talk of, but IF AWS swaps RAM >> to disk like regular Linux, then that might be triggered if your JVM heap is >> bigger than can be handled within the available RAM. >> >> On Tue, Apr 17, 2018 at 9:26 AM, TechnoMage <mla...@technomage.com >> <mailto:mla...@technomage.com>> wrote: >> I am doing a short Proof of Concept for using Flink and Kafka in our >> product. On my laptop I can process 10M inputs in about 90 min. On 2 >> different EC2 instances (m4.xlarge and m5.xlarge both 4core 16GB ram and ssd >> storage) I see the process hit a wall around 50min into the test and short >> of 7M events processed. This is running zookeeper, kafka broker, flink all >> on the same server in all cases. My goal is to measure single node vs. >> multi-node and test horizontal scalability, but I would like to figure out >> why hit hits a wall first. I have the task maanger configured with 6 slots >> and the job has 5 parallelism. The laptop has 8 threads, and the EC2 >> instances have 4 threads. On smaller data sets and in the begining of each >> test the EC2 instances outpace the laptop. I will try again with an >> m5.2xlarge which has 8 threads and 32GB ram to see if that works better for >> this workload. Any pointers or ways to get metrics that would help diagnose >> this would be appreciated. >> >> Michael >> >> >> >> >> -- >> Niclas Hedhman, Software Developer >> http://polygene.apache.org <http://polygene.apache.org/> - New Energy for >> Java