Hi Jagdish,
I failed to mention an important detail, which is that we had change-logging on
for the store. What is interesting is that most of our time is spent in the
“send” method. I see from the code that we only send changelogs when we
“putAllDirtyEntries()” from the object cache.
Is ther
1. What's the on-disk size of the store? (In one of earlier experiments, if
the state size is larger than 10G per partition, we 've observed writes
slow down).
2. Can you benchmark how long writing to RocksDb takes on your SSD? You can
look at
https://github.com/apache/samza/blob/master/samza-test
Also, the PUTs are taking 10x of GETs is what baffles me a little bit. We’re
running on SSDs and here is our config:
stores.stage- store.write.batch.size=10
stores.stage- store.object.cache.size=20
stores.stage- store.rocksdb.num.write.buffers=3
stores.stage- store.rocksdb.compaction.styl
Replies inline
On 3/9/17, 11:24 AM, "Jagadish Venkatraman" wrote:
I understand you are receiving messages from *all* partitions (but fewer
messages from some partitions).
Some questions:
1. Is it possible that you may have saturated the capacity of the entire
contai
I understand you are receiving messages from *all* partitions (but fewer
messages from some partitions).
Some questions:
1. Is it possible that you may have saturated the capacity of the entire
container?
2. What is the time you spend inside *process* and *window* for the
affected container? (How
Replies inline.
--
Ankit
> On Mar 9, 2017, at 12:34 AM, Jagadish Venkatraman
> wrote:
>
> We can certainly help you debug this more. Some questions:
>
> 1. Are you processing messages (at all) from the "suffering" containers?
> (You can verify that by observing metrics/ logging etc.)
Processi
We can certainly help you debug this more. Some questions:
1. Are you processing messages (at all) from the "suffering" containers?
(You can verify that by observing metrics/ logging etc.)
2. If you are indeed processing messages, is it possible the impacted
containers not able to keep up with th
Hi,
While joining streams from 2 partitions to join 2 streams, we see that some
containers start suffering in that, lag (messages behind high watermark) for
one of the tasks starts sky rocketing while the other one is ~ 0.
We are using default values for buffer sizes, fetch threshold, are using