I tried again by increasing the max.poll.interval.ms to a really high number like 12 hrs and since then it has been replicating fine. What I observed was that after every max.poll.interval elapse, it was getting stuck and replication was stopping. Putting a high number like 12 hrs, it worked fine for a few days. Not a full solution but still better than before.
On Wed, Feb 8, 2023, 23:48 Greg Harris <greg.har...@aiven.io.invalid> wrote: > Arpit, > > I am not very familiar with MirrorMaker unfortunately so I won't be able to > give you any specific advice. > I also don't see any MirrorMaker-specific changes that would be relevant, > except for some minor arguments changes and the deprecation landing in 3.0. > > > Its very random. It replicates for couple of hours fine and than stops > for > a day. > > Hopefully logging will help you to understand why the replication flow > starts and stops. > Do you have any very long timeouts which would correspond to the day-long > downtime? > Looking at MirrorMaker options, theres a `abort.on.send.failure` > configuration that may be able to cause MirrorMaker to fail fast in some > cases to allow you to debug it, and possibly auto-restart it. > > > Could you tell me how can I enable more logging ? > > I believe you can configure the logging by changing the KAFKA_LOG4J_OPTS > environment variable before running the mirror maker script. > For example, you could copy and modify the existing tools config: > https://github.com/apache/kafka/blob/trunk/config/tools-log4j.properties > and provide the copy to the MirrorMaker tool with: > > export > > KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/path/to/your/tools-log4j.properties" > > One additional thing you might be able to do if you catch it when it's > stalled is to take a heap dump or stacktrace of the containing JVM. > This will let you see what the process is doing, and maybe see if there are > stuck threads, excess memory, or other things preventing the replication > from progressing. > > Good luck with your investigation! > Greg > > > On Wed, Feb 8, 2023 at 2:20 PM Arpit Jain <jain.arp...@gmail.com> wrote: > > > Hi Greg, > > > > Thanks for getting back to me. Please find more details below > > > > 1. Are you using MirrorMaker, or MirrorMaker 2.0? > > Mirror maker > > 2. What version of MM or MM2 are you using, and with what Kafka broker > > version? > > 3.2.3 > > 3. How is your replication flow configured? > > We have upstream brokers (3 node kafka cluster) and we have one kafka > > consumers for each lower environments and it is producing message for > lower > > environment kafka cluster > > 4. What is the frequency and duration of these interruptions? > > Its very random. It replicates for couple of hours fine and than stops > for > > a day. > > 5. When did the interruptions start? > > Not sure about that. It could be after we moved to 3.2.3 > > 6. Has anything changed in your environment recently, such as new > > partitions or an upgrade? > > No > > 7. Are you seeing any ERROR logs or other unique logs from the > replication > > flow? > > Only the warning to increase max.poll.interval or decrease poll.records > > 8. Have you tried enabling more detailed logs and watched the progress of > > the replication flow around the time it stops replicating? > > Could you tell me how can I enable more logging ? > > > > Thanks, > > Arpit > > > > On Wed, Feb 8, 2023, 18:16 Greg Harris <greg.har...@aiven.io.invalid> > > wrote: > > > > > Arpit, > > > > > > Unfortunately from that description nothing specific is coming to mind. > > > The max.poll.interval indicates that the consumer is losing contact > with > > > the Kafka cluster, but that may be caused by the replication > application > > > hanging somewhere else. > > > > > > Some clarifying questions, and things you can look into: > > > 1. Are you using MirrorMaker, or MirrorMaker 2.0? > > > 2. What version of MM or MM2 are you using, and with what Kafka broker > > > version? > > > 3. How is your replication flow configured? > > > 4. What is the frequency and duration of these interruptions? > > > 5. When did the interruptions start? > > > 6. Has anything changed in your environment recently, such as new > > > partitions or an upgrade? > > > 7. Are you seeing any ERROR logs or other unique logs from the > > replication > > > flow? > > > 8. Have you tried enabling more detailed logs and watched the progress > of > > > the replication flow around the time it stops replicating? > > > > > > Thanks, > > > Greg Harris > > > > > > > > > On Tue, Feb 7, 2023 at 6:27 AM Arpit Jain <jain.arp...@gmail.com> > wrote: > > > > > > > Hi, > > > > > > > > Hope this is the right forum to ask for Kafka mirror maker issues. > > > > We are facing an issue where the mirror maker replicates the trades > and > > > > then doesn't work for long time and again replicates. > > > > Also seeing the warning message to increase the poll interval or > > decrease > > > > the maximum batch size (max.poll.records). > > > > > > > > I have tried reducing max.poll.records to 250 but still same issues. > > > > > > > > Could anyone suggest what could be wrong? > > > > > > > > Thanks > > > > > > > > > >