Re: Tuning replication

Todd Palino Tue, 04 Nov 2014 10:23:12 -0800

I think your answers are pretty spot-on, Joel. Under Replicated Count is
the metric that we monitor to make sure the cluster is healthy. It lets us
know when a broker is down (because all the numbers except one broker are
elevated), or when a broker is struggling (low counts fluctuating across a
few hosts).


As far as lots of small partitions vs. a few large partitions, we prefer
the former. It means we can spread the load out over brokers more evenly.

-Todd

On Tue, Nov 4, 2014 at 10:07 AM, Joel Koshy <jjkosh...@gmail.com> wrote:

> Ops-experts can share more details but here are some comments:
> >
> > * Does Kafka 'like' lots of small partitions for replication, or larger
> > ones?  ie: if I'm passing 1Gbps into a topic, will replication be happier
> > if that's one partition, or many partitions?
>
> Since you also have to account for the NIC utilization by replica
> fetches it is better to split a heavy topic into many partitions.
>
> > * How can we 'up' the priority of replication over other actions?
>
> If you do the above, this should not be necessary but you could
> increase the number of replica fetchers. (num.replica.fetchers)
>
> > * What is the most effective way to monitor the replication lag?  On
> > brokers with hundreds of partitions, the JMX data starts getting very
> > muddled and plentiful.  I'm trying to find something we can
> graph/dashboard
> > to say 'replication is in X state'.  When we look at it in aggregate, we
> > assume that 'big numbers are further behind', but then sometimes find
> > negative numbers as well?
>
> The easiest mbean to look at is the underreplicated partition count.
> This is at the broker-level so it is coarse-grained. If it is > 0 you
> can use various tools to do mbean queries to figure out which
> partition is lagging behind. Another thing you can look at is the ISR
> shrink/expand rate. If you see a lot of churn you may need to tune the
> settings that affect ISR maintenance (replica.lag.time.max.ms,
> replica.lag.max.messages).
>
>
> --
> Joel
>

Re: Tuning replication

Reply via email to