Recently I had a situation occur where a network partition happened between
one of the nodes in a 3 node cluster and zookeeper.  The broker affected
never reconnected to zookeeper (it's ID was not registered in ZK) and the
metrics indicate that it became another active controller.  It still
considered itself leader over the partitions originally assigned to it and
did not indicate any under replicated partitions.

Producers (new producer) did stop producing to it and switched over to the
new leaders, but I'm not sure if it was due to the minISR setting or due to
a metadata update.

The consumers connected to the bad broker didn't react at all and continued
consuming from it.  Lag alerts went off when the consumers started falling
behind on the partitions the bad broker was originally leader of (because
no new data was being received by that broker).

I'm curious to know if anyone has seen behavior like this in 0.8.2.1 before
and if so, does 0.10 help with it?  Ideally I would want the consumers (new
consumer) to react to the fact that this broker split from the cluster and
was no longer receiving data so they could move over to the new partition
leaders.

I would love to supply logs but due to an infrastructure issue many of them
have been lost.

-John

Reply via email to