Hey Leah, Thanks for the response.

We are running Kafka 2.5.1 and if the topology will still be useful after
the next few sentences, i will share it with you (its messy!).
It happens on few partitions, and few internal topics - and it seems to be
kind of random which topics and which partitions exactly.
The business logic in prune to having "hot" partitions since the identifier
being used is coming-in a very different rate during different times of the
day.
We are using rocksdb and I would like to know which metrics you think can
help us (I didn't expose the metrics in a clever way outside yet :/)

Since the topic and partitions are changing, and reset usually fixes the
problem almost immediately - i find it hard to believe it has anything to
do with the topology or business logic but I might be missing something
(since, after restart, the lag disappear with no real effort).

Thanks




On Tue, Dec 8, 2020 at 9:35 PM Leah Thomas <ltho...@confluent.io> wrote:

> Hi Nitay,
>
> What version of Kafka are you running? If you could also give the topology
> you're using that would be great. Do you have a sense of if the lag is
> happening on all partitions or just a few? Also if you're using rocksDB
> there are some rocksDB metrics in newer versions of Kafka that could be
> helpful for diagnosing the issue.
>
> Cheers,
> Leah
>
> On Mon, Dec 7, 2020 at 8:59 AM Nitay Kufert <nita...@ironsrc.com> wrote:
>
> > Hey,
> > We are running a kafka-stream based app in production where the input,
> > intermediate and global topics have 36 partitions.
> > We have 17 sub-tasks (2 of them are for global stores so they won't
> > generate tasks).
> > More tech details:
> > 6 machines with 16cpu's, 30 threads so: 6 * 30 = 180 stream-threads
> > 15 * 36 = 540 tasks
> > 3 tasks per thread
> >
> > Every once in a while, during our rush-hours, some of the internal
> topics,
> > on specific partitions, start to lag - the lag usually keeps increasing
> > until i restart the application - and the lag disappears very quickly.
> >
> > It seems like there is some problem in the work allocation since the
> > machines are not loaded at all, and have enough threads (more than double
> > the cpu's).
> >
> > Any idea what's going on there?
> >
> > --
> >
> > Nitay Kufert
> > Backend Team Leader
> > [image: ironSource] <http://www.ironsrc.com>
> >
> > email nita...@ironsrc.com
> > mobile +972-54-5480021
> > fax +972-77-5448273
> > skype nitay.kufert.ssa
> > 121 Menachem Begin St., Tel Aviv, Israel
> > ironsrc.com <http://www.ironsrc.com>
> > [image: linkedin] <https://www.linkedin.com/company/ironsource> [image:
> > twitter] <https://twitter.com/ironsource> [image: facebook]
> > <https://www.facebook.com/ironSource> [image: googleplus]
> > <https://plus.google.com/+ironsrc>
> > This email (including any attachments) is for the sole use of the
> intended
> > recipient and may contain confidential information which may be protected
> > by legal privilege. If you are not the intended recipient, or the
> employee
> > or agent responsible for delivering it to the intended recipient, you are
> > hereby notified that any use, dissemination, distribution or copying of
> > this communication and/or its content is strictly prohibited. If you are
> > not the intended recipient, please immediately notify us by reply email
> or
> > by telephone, delete this email and destroy any copies. Thank you.
> >
>


-- 

Nitay Kufert
Backend Team Leader
[image: ironSource] <http://www.ironsrc.com>

email nita...@ironsrc.com
mobile +972-54-5480021
fax +972-77-5448273
skype nitay.kufert.ssa
121 Menachem Begin St., Tel Aviv, Israel
ironsrc.com <http://www.ironsrc.com>
[image: linkedin] <https://www.linkedin.com/company/ironsource> [image:
twitter] <https://twitter.com/ironsource> [image: facebook]
<https://www.facebook.com/ironSource> [image: googleplus]
<https://plus.google.com/+ironsrc>
This email (including any attachments) is for the sole use of the intended
recipient and may contain confidential information which may be protected
by legal privilege. If you are not the intended recipient, or the employee
or agent responsible for delivering it to the intended recipient, you are
hereby notified that any use, dissemination, distribution or copying of
this communication and/or its content is strictly prohibited. If you are
not the intended recipient, please immediately notify us by reply email or
by telephone, delete this email and destroy any copies. Thank you.

Reply via email to