We had trouble with batch expired produce errors for high (not really that
high, maybe 400 msgs/sec) volume topic partitions.  We solved these by
increasing `request.timeout.ms` and all increasing `batch.size` (which
reduced the total number of waiting batches in MirrorMaker).

More context here: https://phabricator.wikimedia.org/T189464#4102048



On Mon, Apr 9, 2018 at 1:09 PM, Jeff Field <jvfi...@blizzard.com> wrote:

> We've been stable all weekend with the following settings:
>
> ExecStart=/usr/bin/kafka-mirror-maker --abort.on.send.failure true
> --new.consumer --num.streams 6 --offset.commit.interval.ms 60000
> --consumer.config /etc/kafka/mirrormaker/telem_mm/consumer.properties
> --producer.config /etc/kafka/mirrormaker/telem_mm/producer.properties
> --whitelist
>
> Consumer properties:
> bootstrap
> session.timeout.ms=55000
> heartbeat.interval.ms=15000
> request.timeout.ms=60000
>
> Producer properties:
> Bootstrap
>
> Any other combination of compression/buffer memory/linger/etc. on the 0.9
> producer producing to 0.11/1.0 wasn't reliable - it might work for an hour
> and then die, or it might never work. Once I landed on stable producer
> settings (which were just defaults), the consumer started having time outs
> due to heartbeating (because again, 0.9) so I had to increase the
> heartbeat, session and request timeouts to stabilize the consumer group.
>
> Fortunately, our target cluster for most of our mirrormakers is the last
> one we will upgrade to 1.x, at which point we can just upgrade the
> mirrormakers to 1.x as well.
>
> On 4/6/18, 1:09 PM, "Jeff Field" <jvfi...@blizzard.com> wrote:
>
>     I'm hitting the same problem, even with the new consumer, on
> MirrorMaker 0.9 reading from a 0.9 Kafka cluster and producing to a 0.11
> Kafka cluster.
>
>     On 3/30/18, 3:56 PM, "Andrew Otto" <o...@wikimedia.org> wrote:
>
>         I’m currently stuck on MirrorMaker version 0.9, and I’m not sure
> when the
>         new consumer client became the default.  Does your 0.10 version
> have a
>         —new.consumer option listed in the help message?  If so, then the
> new
>         consumer client is not the default.  I haven’t seen the problem
> you are
>         describing (I’m still having plenty of others though) since I’ve
> switched
>         to using the new consumer.
>
>         Another thought, what is the value of your
> partition.assignment.strategy?
>         I’ve found round robin (default in later versions of MirrorMaker)
> to be a
>         lot more consistent than whatever the default is in 0.9.  Not sure
> what the
>         default in 0.10 is.
>
>
>
>         On Fri, Mar 30, 2018 at 11:40 AM, Siva A <siva9940261...@gmail.com>
> wrote:
>
>         > Any other update on this?
>         >
>         > On Mon, Mar 26, 2018, 7:42 PM Andrew Otto <o...@wikimedia.org>
> wrote:
>         >
>         > > I’ve had similar problems, but I don’t have an explanation for
> ya :/
>         > >
>         > > On Sun, Mar 25, 2018 at 12:19 PM, Siva A <
> siva9940261...@gmail.com>
>         > wrote:
>         > >
>         > > > Hi,
>         > > >
>         > > > We have 3 nodes Kafka cluster(0.10.0.1) and its mirroring
> the data from
>         > > > another 3 node cluster of same Kafka version.
>         > > > Both the clusters are Kerberized and we are running the
> Mirrormaker on
>         > > the
>         > > > target cluster using the single principal/keytab with the
> one way trust
>         > > on
>         > > > the KDC.
>         > > >
>         > > > At times, the mirrormaker stops functioning(Doesn't mirror
> the data)
>         > but
>         > > > the process is still running. If we restart the service then
> it works
>         > > fine
>         > > > for a day or so.
>         > > >
>         > > > I don't see any error on the Kafka logs as well.
>         > > > Is there anyone seen this kind of issue?
>         > > >
>         > > > Thanks
>         > > > Siva
>         > > >
>         > >
>         >
>
>
>
>
>

Reply via email to