We had trouble with batch expired produce errors for high (not really that high, maybe 400 msgs/sec) volume topic partitions. We solved these by increasing `request.timeout.ms` and all increasing `batch.size` (which reduced the total number of waiting batches in MirrorMaker).
More context here: https://phabricator.wikimedia.org/T189464#4102048 On Mon, Apr 9, 2018 at 1:09 PM, Jeff Field <jvfi...@blizzard.com> wrote: > We've been stable all weekend with the following settings: > > ExecStart=/usr/bin/kafka-mirror-maker --abort.on.send.failure true > --new.consumer --num.streams 6 --offset.commit.interval.ms 60000 > --consumer.config /etc/kafka/mirrormaker/telem_mm/consumer.properties > --producer.config /etc/kafka/mirrormaker/telem_mm/producer.properties > --whitelist > > Consumer properties: > bootstrap > session.timeout.ms=55000 > heartbeat.interval.ms=15000 > request.timeout.ms=60000 > > Producer properties: > Bootstrap > > Any other combination of compression/buffer memory/linger/etc. on the 0.9 > producer producing to 0.11/1.0 wasn't reliable - it might work for an hour > and then die, or it might never work. Once I landed on stable producer > settings (which were just defaults), the consumer started having time outs > due to heartbeating (because again, 0.9) so I had to increase the > heartbeat, session and request timeouts to stabilize the consumer group. > > Fortunately, our target cluster for most of our mirrormakers is the last > one we will upgrade to 1.x, at which point we can just upgrade the > mirrormakers to 1.x as well. > > On 4/6/18, 1:09 PM, "Jeff Field" <jvfi...@blizzard.com> wrote: > > I'm hitting the same problem, even with the new consumer, on > MirrorMaker 0.9 reading from a 0.9 Kafka cluster and producing to a 0.11 > Kafka cluster. > > On 3/30/18, 3:56 PM, "Andrew Otto" <o...@wikimedia.org> wrote: > > I’m currently stuck on MirrorMaker version 0.9, and I’m not sure > when the > new consumer client became the default. Does your 0.10 version > have a > —new.consumer option listed in the help message? If so, then the > new > consumer client is not the default. I haven’t seen the problem > you are > describing (I’m still having plenty of others though) since I’ve > switched > to using the new consumer. > > Another thought, what is the value of your > partition.assignment.strategy? > I’ve found round robin (default in later versions of MirrorMaker) > to be a > lot more consistent than whatever the default is in 0.9. Not sure > what the > default in 0.10 is. > > > > On Fri, Mar 30, 2018 at 11:40 AM, Siva A <siva9940261...@gmail.com> > wrote: > > > Any other update on this? > > > > On Mon, Mar 26, 2018, 7:42 PM Andrew Otto <o...@wikimedia.org> > wrote: > > > > > I’ve had similar problems, but I don’t have an explanation for > ya :/ > > > > > > On Sun, Mar 25, 2018 at 12:19 PM, Siva A < > siva9940261...@gmail.com> > > wrote: > > > > > > > Hi, > > > > > > > > We have 3 nodes Kafka cluster(0.10.0.1) and its mirroring > the data from > > > > another 3 node cluster of same Kafka version. > > > > Both the clusters are Kerberized and we are running the > Mirrormaker on > > > the > > > > target cluster using the single principal/keytab with the > one way trust > > > on > > > > the KDC. > > > > > > > > At times, the mirrormaker stops functioning(Doesn't mirror > the data) > > but > > > > the process is still running. If we restart the service then > it works > > > fine > > > > for a day or so. > > > > > > > > I don't see any error on the Kafka logs as well. > > > > Is there anyone seen this kind of issue? > > > > > > > > Thanks > > > > Siva > > > > > > > > > > > > > >