Hey, We are running a kafka-stream based app in production where the input, intermediate and global topics have 36 partitions. We have 17 sub-tasks (2 of them are for global stores so they won't generate tasks). More tech details: 6 machines with 16cpu's, 30 threads so: 6 * 30 = 180 stream-threads 15 * 36 = 540 tasks 3 tasks per thread
Every once in a while, during our rush-hours, some of the internal topics, on specific partitions, start to lag - the lag usually keeps increasing until i restart the application - and the lag disappears very quickly. It seems like there is some problem in the work allocation since the machines are not loaded at all, and have enough threads (more than double the cpu's). Any idea what's going on there? -- Nitay Kufert Backend Team Leader [image: ironSource] <http://www.ironsrc.com> email nita...@ironsrc.com mobile +972-54-5480021 fax +972-77-5448273 skype nitay.kufert.ssa 121 Menachem Begin St., Tel Aviv, Israel ironsrc.com <http://www.ironsrc.com> [image: linkedin] <https://www.linkedin.com/company/ironsource> [image: twitter] <https://twitter.com/ironsource> [image: facebook] <https://www.facebook.com/ironSource> [image: googleplus] <https://plus.google.com/+ironsrc> This email (including any attachments) is for the sole use of the intended recipient and may contain confidential information which may be protected by legal privilege. If you are not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication and/or its content is strictly prohibited. If you are not the intended recipient, please immediately notify us by reply email or by telephone, delete this email and destroy any copies. Thank you.