Recently we ran into a situation where the LedgerMetadataListener never returned/detected metadata change. Due to this reader had stale metadata and tried to read from bookies that no longer have that ledger, hence NoSuchLedgerExistsException was returned to the caller.
1. I wonder if NoSuchLedgerExistsException is the right error here? - Client knows that the ledger exists in the metadata. It has valid handle. So ledger *Exists*. - In this case it is stale metadata so a restart of client took care of the situation. But what if the ledger is in ZK, but missing from all bookies? This can be a durability or availability issue based on the bookies in the metadata are part of the cluster or not. - I think we need to have more sophisticated error handling here. Comments? 2. Having too many watches puts memory pressure on the client. - How about having an option to re-read the metadata on demand w/o watch? - Schedule a task to reread metadata on the first bookie failure with NoSuchEntry/NoSuchLedger. - If all three bookies fail, wait for the outstanding metadata read to return before failing to user. - If the metadata is read, and is different from the local copy, reattempt the read. - If metadata is not different, then fail with "some new error" DataLossException or something? - This can cause latency if the metadata is changing a lot, but may be better than constant watches? It could be a configuration option. - We could even think of having both enabled if the reader is super conservative. Thoughts? JV -- Jvrao --- First they ignore you, then they laugh at you, then they fight you, then you win. - Mahatma Gandhi