Hey Jun,

I don't think we should allow failed replicas to be re-created on the good
disks. Say there are 2 disks and each of them is 51% loaded. If any disk
fail, and we allow replicas to be re-created on the other disks, both disks
will fail. Alternatively we can disable replica creation if there is bad
disk on a broker. I personally think it is worth the additional complexity
in the broker to store created replicas in ZK so that we allow new replicas
to be created on the broker even when there is bad log directory. This
approach won't add complexity in the controller. But I am fine with
disabling replica creation when there is bad log directory that if it is
the only blocking issue for this KIP.

Whether we store created flags is independent of whether/how we store
offline replicas. Per our previous discussion, do you think it is OK not
store offline replicas in ZK and propagate the offline replicas from broker
to controller via LeaderAndIsrRequest?

Thanks,
Dong

Reply via email to