I am not sure off the top, but since the method is on the consumer my intuition is that it would pause all the partitions the consumer is reading from. I think the best thing to do is write a little test harness app to verify the behavior.
On Mon, May 11, 2020 at 7:31 AM Ali Nazemian <alinazem...@gmail.com> wrote: > Hi Jason, > > Thank you for the message. It seems quite interesting. So something I am > not sure about "pause" and "resume" is it works based on a partition > allocation. What will happen if more partitions are assigned to a single > consumer? For example, in the case where we have over-partition a Kafka > topic to reserve some capacity for scaling up in case of burst traffic. > > Regards, > Ali > > On Sat, May 9, 2020 at 11:52 PM Jason Turim <ja...@signalvine.com> wrote: > > > Hi Ali, > > > > You may want to look at using the consumer pause / resume api. Its a > > mechanism that allows you to poll without retrieving new messages. > > > > I employed this strategy to effectively handle highly variable workloads > by > > processing them in a background thread. First pause the consumer when > > messages were received. Then start a background thread to process the > > data, meanwhile the primary thread continues to poll (it won't retrieve > > data while the consumer is paused). Finally, when the bg processing is > > complete, call resume on the consumer to retrieve the next batch of > > messages. > > > > More info here in the section "Detecting Consumer Failures" - > > > > > https://kafka.apache.org/22/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html > > > > hth, > > jt > > > > On Fri, May 8, 2020, 10:50 PM Chris Toomey <ctoo...@gmail.com> wrote: > > > > > What exactly is your understanding of what's happening when you say > "the > > > pipeline will be blocked by the group coordinator for up to " > > > max.poll.interval.ms""? Please explain that. > > > > > > There's no universal recipe for "long-running jobs", there's just > > > particular issues you might be encountering and suggested solutions to > > > those issues. > > > > > > > > > > > > On Fri, May 8, 2020 at 7:03 PM Ali Nazemian <alinazem...@gmail.com> > > wrote: > > > > > > > Hi Chris, > > > > > > > > I am not sure where I said about the "automatic partition > > reassignment", > > > > but what I know here is the side effect of increasing " > > > > max.poll.interval.ms" > > > > is if the consumer hangs for whatever reason the pipeline will be > > blocked > > > > by the group coordinator for up to "max.poll.interval.ms". So I am > not > > > > sure > > > > if this is because of the automatic partition assignment or something > > > else. > > > > What I am looking for is how I can deal with long-running jobs in > > Apache > > > > Kafka. > > > > > > > > Thanks, > > > > Ali > > > > > > > > On Sat, May 9, 2020 at 4:25 AM Chris Toomey <ctoo...@gmail.com> > wrote: > > > > > > > > > I interpreted your post as saying "when our consumer gets stuck, > > > Kafka's > > > > > automatic partition reassignment kicks in and that's problematic > for > > > us." > > > > > Hence I suggested not using the automatic partition assignment, > which > > > per > > > > > my interpretation would address your issue. > > > > > > > > > > Chris > > > > > > > > > > On Fri, May 8, 2020 at 2:19 AM Ali Nazemian <alinazem...@gmail.com > > > > > > wrote: > > > > > > > > > > > Thanks, Chris. So what is causing the consumer to get stuck is a > > side > > > > > > effect of the built-in partition assignment in Kafka and by > > > overriding > > > > > that > > > > > > behaviour I should be able to address the long-running job issue, > > is > > > > that > > > > > > right? Can you please elaborate more on this? > > > > > > > > > > > > Regards, > > > > > > Ali > > > > > > > > > > > > On Fri, May 8, 2020 at 1:09 PM Chris Toomey <ctoo...@gmail.com> > > > wrote: > > > > > > > > > > > > > You really have to decide what behavior it is you want when one > > of > > > > your > > > > > > > consumers gets "stuck". If you don't like the way the group > > > protocol > > > > > > > dynamically manages topic partition assignments or can't figure > > out > > > > an > > > > > > > appropriate set of configuration settings that achieve your > goal, > > > you > > > > > can > > > > > > > always elect to not use the group protocol and instead manage > > topic > > > > > > > partition assignment yourself. As I just replied to another > post, > > > > > > there's a > > > > > > > nice writeup of this under "Manual Partition Assignment" in > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://kafka.apache.org/24/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html > > > > > > > . > > > > > > > > > > > > > > Chris > > > > > > > > > > > > > > > > > > > > > On Thu, May 7, 2020 at 12:37 AM Ali Nazemian < > > > alinazem...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > To help understanding my case in more details, the error I > can > > > see > > > > > > > > constantly is the consumer losing heartbeat and hence > > apparently > > > > the > > > > > > > group > > > > > > > > get rebalanced based on the log I can see from Kafka side: > > > > > > > > > > > > > > > > GroupCoordinator 11]: Member > > > > > > > > consumer-3-f46e14b4-5998-4083-b7ec-bed4e3f374eb in group foo > > has > > > > > > failed, > > > > > > > > removing it from the group > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Ali > > > > > > > > > > > > > > > > On Thu, May 7, 2020 at 2:38 PM Ali Nazemian < > > > alinazem...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > With the emerge of using Apache Kafka for event-driven > > > > > architecture, > > > > > > > one > > > > > > > > > thing that has become important is how to tune apache Kafka > > > > > consumer > > > > > > to > > > > > > > > > manage long-running jobs. The main issue raises when we > set a > > > > > > > relatively > > > > > > > > > large value for "max.poll.interval.ms". Setting this value > > > will, > > > > > of > > > > > > > > > course, resolve the issue of repetitive rebalance, but > > creates > > > > > > another > > > > > > > > > operational issue. I am looking for some sort of golden > > > strategy > > > > to > > > > > > > deal > > > > > > > > > with long-running jobs with Apache Kafka. > > > > > > > > > > > > > > > > > > If the consumer hangs for whatever reason, there is no easy > > way > > > > of > > > > > > > > passing > > > > > > > > > that stage. It can easily block the pipeline, and you > cannot > > do > > > > > much > > > > > > > > about > > > > > > > > > it. Therefore, it came to my mind that I am probably > missing > > > > > > something > > > > > > > > > here. What are the expectations? Is it not valid to use > > Apache > > > > > Kafka > > > > > > > for > > > > > > > > > long-live jobs? Are there any other parameters need to be > > set, > > > > and > > > > > > the > > > > > > > > > issue of a consumer being stuck is caused by > > misconfiguration? > > > > > > > > > > > > > > > > > > I can see there are a lot of the same issues have been > raised > > > > > > regarding > > > > > > > > > "the consumer is stuck" and usually, the answer has been > > "yeah, > > > > > > that's > > > > > > > > > because you have a long-running job, etc.". I have seen > > > different > > > > > > > > > suggestions: > > > > > > > > > > > > > > > > > > - Avoid using long-running jobs. Read the message, submit > it > > > into > > > > > > > another > > > > > > > > > thread and let the consumer to pass. Obviously this can > cause > > > > data > > > > > > loss > > > > > > > > and > > > > > > > > > it would be a difficult problem to handle. It might be > better > > > to > > > > > > avoid > > > > > > > > > using Kafka in the first place for these types of requests. > > > > > > > > > > > > > > > > > > - Avoid using apache Kafka for long-running requests > > > > > > > > > > > > > > > > > > - Workaround based approaches like if the consumer is > > blocked, > > > > try > > > > > to > > > > > > > use > > > > > > > > > another consumer group and set the offset to the current > > value > > > > for > > > > > > the > > > > > > > > new > > > > > > > > > consumer group, etc. > > > > > > > > > > > > > > > > > > There might be other suggestions I have missed here, but > that > > > is > > > > > not > > > > > > > the > > > > > > > > > point of this email. What I am looking for is what is the > > best > > > > > > practice > > > > > > > > for > > > > > > > > > dealing with long-running jobs with Apache Kafka. I cannot > > > easily > > > > > > avoid > > > > > > > > > using Kafka because it plays a critical part in our > > application > > > > and > > > > > > > data > > > > > > > > > pipeline. On the other side, we have had so many challenges > > to > > > > keep > > > > > > the > > > > > > > > > long-running jobs stable operationally. So I would > appreciate > > > it > > > > if > > > > > > > > someone > > > > > > > > > can help me to understand what approach can be taken to > deal > > > with > > > > > > these > > > > > > > > > jobs with Apache Kafka as a message broker. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Ali > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > A.Nazemian > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > A.Nazemian > > > > > > > > > > > > > > > > > > > > > > > -- > > > > A.Nazemian > > > > > > > > > > > > -- > A.Nazemian > -- Jason Turim (he, him & his) ​Vice President of Software Engineering SignalVine Inc <http://www.signalvine.com> (m) 415-407-6501