Hey Jun, I don't think we should allow failed replicas to be re-created on the good disks. Say there are 2 disks and each of them is 51% loaded. If any disk fail, and we allow replicas to be re-created on the other disks, both disks will fail. Alternatively we can disable replica creation if there is bad disk on a broker. I personally think it is worth the additional complexity in the broker to store created replicas in ZK so that we allow new replicas to be created on the broker even when there is bad log directory. This approach won't add complexity in the controller. But I am fine with disabling replica creation when there is bad log directory that if it is the only blocking issue for this KIP.
Whether we store created flags is independent of whether/how we store offline replicas. Per our previous discussion, do you think it is OK not store offline replicas in ZK and propagate the offline replicas from broker to controller via LeaderAndIsrRequest? Thanks, Dong