Hi all, I saw in the Pulsar Metadata handler, we retry the operation when zookeeper throws a connection loss exception. But the operation may fail after the retry.
For example, we update the ledgers map in memory after successfully updating the LedgerInfo in the zookeeper. If the zookeeper update operation execute successfully on the server but throws connection loss on the client, and we have to retry on the connection loss exception, then the callback may be received a BadVersion exception. At this moment, the memory ledgers list is different from the zookeeper server. And that may cause some other issues on the broker. We need to do some work on the metastore and managed ledger to keep the consistency between them. But that would change most of the callback of the meta store to handle it. I want to know more ideas from yours. WDYT? Regards, Yong