Also, I note some messages in the log about my java class not being a valid 
POJO because it is missing accessors for a field.  Would this impact 
performance significantly?

Michael

> On Apr 17, 2018, at 12:54 PM, TechnoMage <mla...@technomage.com> wrote:
> 
> No checkpoints are active.
> I will try that back end.
> Yes, using JSONObject subclass for most of the intermediate state, with JSON 
> strings in and out of Kafka.  I will look at the config page for how to 
> enable that.
> 
> Thank you,
> Michael
> 
>> On Apr 17, 2018, at 12:51 PM, Stephan Ewen <se...@apache.org 
>> <mailto:se...@apache.org>> wrote:
>> 
>> A few ideas how to start debugging this:
>> 
>>   - Try deactivating checkpoints. Without that, no work goes into persisting 
>> rocksdb data to the checkpoint store.
>>   - Try to swap RocksDB for the FsStateBackend - that reduces serialization 
>> cost for moving data between heap and offheap (rocksdb).
>>   - Do you have some expensive types (JSON, etc)? Try activating object 
>> reuse (which avoids some extra defensive copies)
>> 
>> On Tue, Apr 17, 2018 at 5:50 PM, TechnoMage <mla...@technomage.com 
>> <mailto:mla...@technomage.com>> wrote:
>> Memory use is steady throughout the job, but the CPU utilization drops off a 
>> cliff.  I assume this is because it becomes I/O bound shuffling managed 
>> state.
>> 
>> Are there any metrics on managed state that can help in evaluating what to 
>> do next?
>> 
>> Michael
>> 
>> 
>>> On Apr 17, 2018, at 7:11 AM, Michael Latta <mla...@technomage.com 
>>> <mailto:mla...@technomage.com>> wrote:
>>> 
>>> Thanks for the suggestion. The task manager is configured for 8GB of heap, 
>>> and gets to about 8.3 total. Other java processes (job manager and Kafka). 
>>> Add a few more. I will check it again but the instances have 16GB same as 
>>> my laptop that completes the test in <90 min. 
>>> 
>>> Michael
>>> 
>>> Sent from my iPad
>>> 
>>> On Apr 16, 2018, at 10:53 PM, Niclas Hedhman <nic...@hedhman.org 
>>> <mailto:nic...@hedhman.org>> wrote:
>>> 
>>>> 
>>>> Have you checked memory usage? It could be as simple as either having 
>>>> memory leaks, or aggregating more than you think (sometimes not obvious 
>>>> how much is kept around in memory for longer than one first thinks). If 
>>>> possible, connect FlightRecorder or similar tool and keep an eye on 
>>>> memory. Additionally, I don't have AWS experience to talk of, but IF AWS 
>>>> swaps RAM to disk like regular Linux, then that might be triggered if your 
>>>> JVM heap is bigger than can be handled within the available RAM.
>>>> 
>>>> On Tue, Apr 17, 2018 at 9:26 AM, TechnoMage <mla...@technomage.com 
>>>> <mailto:mla...@technomage.com>> wrote:
>>>> I am doing a short Proof of Concept for using Flink and Kafka in our 
>>>> product.  On my laptop I can process 10M inputs in about 90 min.  On 2 
>>>> different EC2 instances (m4.xlarge and m5.xlarge both 4core 16GB ram and 
>>>> ssd storage) I see the process hit a wall around 50min into the test and 
>>>> short of 7M events processed.  This is running zookeeper, kafka broker, 
>>>> flink all on the same server in all cases.  My goal is to measure single 
>>>> node vs. multi-node and test horizontal scalability, but I would like to 
>>>> figure out why hit hits a wall first.  I have the task maanger configured 
>>>> with 6 slots and the job has 5 parallelism.  The laptop has 8 threads, and 
>>>> the EC2 instances have 4 threads. On smaller data sets and in the begining 
>>>> of each test the EC2 instances outpace the laptop.  I will try again with 
>>>> an m5.2xlarge which has 8 threads and 32GB ram to see if that works better 
>>>> for this workload.  Any pointers or ways to get metrics that would help 
>>>> diagnose this would be appreciated.
>>>> 
>>>> Michael
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Niclas Hedhman, Software Developer
>>>> http://polygene.apache.org <http://polygene.apache.org/> - New Energy for 
>>>> Java
>> 
>> 
> 

Reply via email to