Re: Multi task Container Starvation

2017-03-09 Thread Ankit Malhotra
Hi Jagdish, I failed to mention an important detail, which is that we had change-logging on for the store. What is interesting is that most of our time is spent in the “send” method. I see from the code that we only send changelogs when we “putAllDirtyEntries()” from the object cache. Is ther

Re: Multi task Container Starvation

2017-03-09 Thread Jagadish Venkatraman
1. What's the on-disk size of the store? (In one of earlier experiments, if the state size is larger than 10G per partition, we 've observed writes slow down). 2. Can you benchmark how long writing to RocksDb takes on your SSD? You can look at https://github.com/apache/samza/blob/master/samza-test

Re: Multi task Container Starvation

2017-03-09 Thread Ankit Malhotra
Also, the PUTs are taking 10x of GETs is what baffles me a little bit. We’re running on SSDs and here is our config: stores.stage- store.write.batch.size=10 stores.stage- store.object.cache.size=20 stores.stage- store.rocksdb.num.write.buffers=3 stores.stage- store.rocksdb.compaction.styl

Re: Multi task Container Starvation

2017-03-09 Thread Ankit Malhotra
Replies inline On 3/9/17, 11:24 AM, "Jagadish Venkatraman" wrote: I understand you are receiving messages from *all* partitions (but fewer messages from some partitions). Some questions: 1. Is it possible that you may have saturated the capacity of the entire contai

Re: Multi task Container Starvation

2017-03-09 Thread Jagadish Venkatraman
I understand you are receiving messages from *all* partitions (but fewer messages from some partitions). Some questions: 1. Is it possible that you may have saturated the capacity of the entire container? 2. What is the time you spend inside *process* and *window* for the affected container? (How

Re: Multi task Container Starvation

2017-03-09 Thread Ankit Malhotra
Replies inline. -- Ankit > On Mar 9, 2017, at 12:34 AM, Jagadish Venkatraman > wrote: > > We can certainly help you debug this more. Some questions: > > 1. Are you processing messages (at all) from the "suffering" containers? > (You can verify that by observing metrics/ logging etc.) Processi

Re: Multi task Container Starvation

2017-03-08 Thread Jagadish Venkatraman
We can certainly help you debug this more. Some questions: 1. Are you processing messages (at all) from the "suffering" containers? (You can verify that by observing metrics/ logging etc.) 2. If you are indeed processing messages, is it possible the impacted containers not able to keep up with th

Multi task Container Starvation

2017-03-08 Thread Ankit Malhotra
Hi, While joining streams from 2 partitions to join 2 streams, we see that some containers start suffering in that, lag (messages behind high watermark) for one of the tasks starts sky rocketing while the other one is ~ 0. We are using default values for buffer sizes, fetch threshold, are using