I set job.container.single.thread.mode = true

And actually I think we did catch up with that setting. I have since completed 
also the merge of 0.14.1 and we are able to keep up with the input now.

Thanks again for the pointers and the fast response!

-----Original Message-----
From: Prateek Maheshwari [mailto:prateek...@gmail.com] 
Sent: Friday, June 8, 2018 15:00
To: dev@samza.apache.org
Subject: Re: Urgent : Help with latency / backlog / topic lag

Just to clarify, when you say you tried single threaded mode, do you mean that 
you set job.container.thread.pool.size = 1, or that you set 
job.container.single.thread.mode = true?

On Fri, Jun 8, 2018 at 2:53 PM, Thunder Stumpges <tstump...@ntent.com> wrote:
> Thanks for the quick reply. That sounds very much like what I'm 
> seeing. I'm merging in 0.14.1 to our branch now. I did try single 
> threaded mode and unfortunately that didn't seem to make a significant 
> difference. Perhaps I do need some multithreading? I'm seeing a task 
> latency 0.2ms per message but still only achieve ~700/sec
>
>
> -----Original Message-----
> From: Prateek Maheshwari [mailto:prateek...@gmail.com]
> Sent: Friday, June 8, 2018 13:54
> To: dev@samza.apache.org
> Subject: Re: Urgent : Help with latency / backlog / topic lag
>
> Hi Thunder,
>
>> What we believe may be happening is that most of the topics have no
> backlog, but one topic has all the backlog (this is because one of the topics 
> accounts for ~60% of the total message rate).  Could there be something 
> inducing extra latency on processing the one topic with a backlog just having 
> a bunch of other topics with NO backlog?
> This seems very similar to this issue:
> https://issues.apache.org/jira/browse/SAMZA-1599
> This was fixed in https://github.com/apache/samza/pull/436, and the fix 
> should be available in the 0.14.1 version.
> Would it be possible to try upgrading to 0.14.1? It should be backwards 
> compatible with 0.14.0.
>
> For something you can try without upgrading: try setting 
> "job.container.single.thread.mode" to true. From the configuration 
> reference
> <https://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html>:
> "If set to true, samza will fallback to legacy single-threaded event loop.
> Default is false, which enables the multithreading execution."
>
> Let us know if this doesn't help.
>
> Thanks,
> Prateek
>
> On Fri, Jun 8, 2018 at 1:35 PM, Thunder Stumpges <tstump...@ntent.com>
> wrote:
>
>> We have a new samza job which we just put into production. This job 
>> processes many topics (~30) but the total rate is not that high 
>> (~1200/sec in aggregate). I am unable to get above ~700/sec and have a 
>> growing backlog.
>>
>> We are running samza 0.12 (I have an update to 0.14 that is not 
>> tested or pushed yet).  When we load tested with a single topic, we 
>> could easily do several thousand per second. The latency of a single 
>> message is about 0.5ms as recorded by our timer metric on our 'process' call.
>>
>> What we believe may be happening is that most of the topics have no 
>> backlog, but one topic has all the backlog (this is because one of 
>> the topics accounts for ~60% of the total message rate).  Could there 
>> be something inducing extra latency on processing the one topic with 
>> a backlog just having a bunch of other topics with NO backlog?
>>
>> Some things I have tried:
>>
>>
>>   1.  Increasing thread pool (10->20->30), no change
>>   2.  Going from 1 container to 2, no help (the two containers run at 
>> half the speed and total is the same)
>>   3.  Increasing task.max.concurrency from 1 -> 2 -> 3  (this had 
>> some minor help going from 1 to 2, but not enough)
>>   4.  Increasing fetch.threshold.bytes (currently at 100,000 and we 
>> have pretty small messages)
>>
>> Some observed metrics:
>>
>>
>>   *   "Pending Messages" are > 0  (15+ on some partitions)
>>   *   "Messages in flight" is almost always 0
>>   *   Polls rate is ~50/sec
>>   *   Message chooser "Choos Obj" is ~680-700/sec like our processing rate
>>   *   Message chooser "choose null" is ~50/sec
>>
>> I'm somewhat at a loss because based on the actual processing latency 
>> we should easily be able to do 2000+ with just a small handful of threads.
>>
>> Thanks in advance, this is in production I really need a solution.
>> Thunder
>>
>>

Reply via email to