Recently we ran into a situation where the LedgerMetadataListener never
returned/detected metadata change. Due to this reader had stale metadata
and tried to read from bookies that no longer have that ledger, hence
NoSuchLedgerExistsException was returned to the caller.

1. I wonder if NoSuchLedgerExistsException is the right error here?

   - Client knows that the ledger exists in the metadata. It has valid
   handle. So ledger *Exists*.
   -  In this case it is stale metadata so a restart of client took care of
   the situation. But what if the ledger is in ZK, but missing from all
   bookies? This can be a durability or availability issue based on the
   bookies in the metadata are part of the cluster or not.
   - I think we need to have more sophisticated error handling here.
   Comments?

2. Having too many watches puts memory pressure on the client.

   - How about having an option to re-read the metadata on demand w/o watch?
      - Schedule a task to reread metadata on the first bookie failure with
      NoSuchEntry/NoSuchLedger.
      - If all three bookies fail, wait for the outstanding metadata read
      to return before failing to user.
      - If the metadata is read, and is different from the local copy,
      reattempt the read.
      - If metadata is not different, then fail with "some new error"
      DataLossException or something?
   - This can cause latency if the metadata is changing a lot, but may be
   better than constant watches? It could be a configuration option.
   - We could even think of having both enabled if the reader is super
   conservative.


Thoughts?
JV


-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Reply via email to