Just to clarify, when you say you tried single threaded mode, do you mean that you set job.container.thread.pool.size = 1, or that you set job.container.single.thread.mode = true?
On Fri, Jun 8, 2018 at 2:53 PM, Thunder Stumpges <tstump...@ntent.com> wrote: > Thanks for the quick reply. That sounds very much like what I'm seeing. I'm > merging in 0.14.1 to our branch now. I did try single threaded mode and > unfortunately that didn't seem to make a significant difference. Perhaps I do > need some multithreading? I'm seeing a task latency 0.2ms per message but > still only achieve ~700/sec > > > -----Original Message----- > From: Prateek Maheshwari [mailto:prateek...@gmail.com] > Sent: Friday, June 8, 2018 13:54 > To: dev@samza.apache.org > Subject: Re: Urgent : Help with latency / backlog / topic lag > > Hi Thunder, > >> What we believe may be happening is that most of the topics have no > backlog, but one topic has all the backlog (this is because one of the topics > accounts for ~60% of the total message rate). Could there be something > inducing extra latency on processing the one topic with a backlog just having > a bunch of other topics with NO backlog? > This seems very similar to this issue: > https://issues.apache.org/jira/browse/SAMZA-1599 > This was fixed in https://github.com/apache/samza/pull/436, and the fix > should be available in the 0.14.1 version. > Would it be possible to try upgrading to 0.14.1? It should be backwards > compatible with 0.14.0. > > For something you can try without upgrading: try setting > "job.container.single.thread.mode" to true. From the configuration reference > <https://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html>: > "If set to true, samza will fallback to legacy single-threaded event loop. > Default is false, which enables the multithreading execution." > > Let us know if this doesn't help. > > Thanks, > Prateek > > On Fri, Jun 8, 2018 at 1:35 PM, Thunder Stumpges <tstump...@ntent.com> > wrote: > >> We have a new samza job which we just put into production. This job >> processes many topics (~30) but the total rate is not that high >> (~1200/sec in aggregate). I am unable to get above ~700/sec and have a >> growing backlog. >> >> We are running samza 0.12 (I have an update to 0.14 that is not tested >> or pushed yet). When we load tested with a single topic, we could >> easily do several thousand per second. The latency of a single message >> is about 0.5ms as recorded by our timer metric on our 'process' call. >> >> What we believe may be happening is that most of the topics have no >> backlog, but one topic has all the backlog (this is because one of the >> topics accounts for ~60% of the total message rate). Could there be >> something inducing extra latency on processing the one topic with a >> backlog just having a bunch of other topics with NO backlog? >> >> Some things I have tried: >> >> >> 1. Increasing thread pool (10->20->30), no change >> 2. Going from 1 container to 2, no help (the two containers run at >> half the speed and total is the same) >> 3. Increasing task.max.concurrency from 1 -> 2 -> 3 (this had some >> minor help going from 1 to 2, but not enough) >> 4. Increasing fetch.threshold.bytes (currently at 100,000 and we >> have pretty small messages) >> >> Some observed metrics: >> >> >> * "Pending Messages" are > 0 (15+ on some partitions) >> * "Messages in flight" is almost always 0 >> * Polls rate is ~50/sec >> * Message chooser "Choos Obj" is ~680-700/sec like our processing rate >> * Message chooser "choose null" is ~50/sec >> >> I'm somewhat at a loss because based on the actual processing latency >> we should easily be able to do 2000+ with just a small handful of threads. >> >> Thanks in advance, this is in production I really need a solution. >> Thunder >> >>