topic lag

Thunder Stumpges Fri, 08 Jun 2018 14:54:01 -0700

Thanks for the quick reply. That sounds very much like what I'm seeing. I'm 
merging in 0.14.1 to our branch now. I did try single threaded mode and 
unfortunately that didn't seem to make a significant difference. Perhaps I do 
need some multithreading? I'm seeing a task latency 0.2ms per message but still 
only achieve ~700/sec



-----Original Message-----
From: Prateek Maheshwari [mailto:[email protected]] 
Sent: Friday, June 8, 2018 13:54
To: [email protected]
Subject: Re: Urgent : Help with latency / backlog / topic lag

Hi Thunder,

> What we believe may be happening is that most of the topics have no
backlog, but one topic has all the backlog (this is because one of the topics 
accounts for ~60% of the total message rate).  Could there be something 
inducing extra latency on processing the one topic with a backlog just having a 
bunch of other topics with NO backlog?
This seems very similar to this issue:
https://issues.apache.org/jira/browse/SAMZA-1599
This was fixed in https://github.com/apache/samza/pull/436, and the fix should 
be available in the 0.14.1 version.
Would it be possible to try upgrading to 0.14.1? It should be backwards 
compatible with 0.14.0.

For something you can try without upgrading: try setting 
"job.container.single.thread.mode" to true. From the configuration reference
<https://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html>:
"If set to true, samza will fallback to legacy single-threaded event loop.
Default is false, which enables the multithreading execution."

Let us know if this doesn't help.

Thanks,
Prateek

On Fri, Jun 8, 2018 at 1:35 PM, Thunder Stumpges <[email protected]>
wrote:

> We have a new samza job which we just put into production. This job 
> processes many topics (~30) but the total rate is not that high 
> (~1200/sec in aggregate). I am unable to get above ~700/sec and have a 
> growing backlog.
>
> We are running samza 0.12 (I have an update to 0.14 that is not tested 
> or pushed yet).  When we load tested with a single topic, we could 
> easily do several thousand per second. The latency of a single message 
> is about 0.5ms as recorded by our timer metric on our 'process' call.
>
> What we believe may be happening is that most of the topics have no 
> backlog, but one topic has all the backlog (this is because one of the 
> topics accounts for ~60% of the total message rate).  Could there be 
> something inducing extra latency on processing the one topic with a 
> backlog just having a bunch of other topics with NO backlog?
>
> Some things I have tried:
>
>
>   1.  Increasing thread pool (10->20->30), no change
>   2.  Going from 1 container to 2, no help (the two containers run at 
> half the speed and total is the same)
>   3.  Increasing task.max.concurrency from 1 -> 2 -> 3  (this had some 
> minor help going from 1 to 2, but not enough)
>   4.  Increasing fetch.threshold.bytes (currently at 100,000 and we 
> have pretty small messages)
>
> Some observed metrics:
>
>
>   *   "Pending Messages" are > 0  (15+ on some partitions)
>   *   "Messages in flight" is almost always 0
>   *   Polls rate is ~50/sec
>   *   Message chooser "Choos Obj" is ~680-700/sec like our processing rate
>   *   Message chooser "choose null" is ~50/sec
>
> I'm somewhat at a loss because based on the actual processing latency 
> we should easily be able to do 2000+ with just a small handful of threads.
>
> Thanks in advance, this is in production I really need a solution.
> Thunder
>
>

RE: Urgent : Help with latency / backlog / topic lag

Reply via email to