Hi Yong, 

Thanks for sharing your findings. Would it make sense to also share the issues 
with some detailed log messages in GH issues so that others that experience 
these problems would be able to find the later fixes for this problem and track 
the status?

> a BadVersion exception. At this moment, the memory ledgers list is
> different from
> the zookeeper server. And that may cause some other issues on the broker.

This sounds like a severe issue that could lead to data loss. Is that correct? 
What are the implications of this?

> We need to do some work on the metastore and managed ledger to keep the
> consistency between them. But that would change most of the callback of the
> meta store to handle it.

This sounds reasonable. Would you be able to share more details about this 
solution?

-Lari

On 2022/09/06 09:33:26 Yong Zhang wrote:
> Hi all,
> 
> I saw in the Pulsar Metadata handler, we retry the operation when zookeeper
> throws a connection loss exception. But the operation may fail after the
> retry.
> 
> For example, we update the ledgers map in memory after successfully
> updating the LedgerInfo in the zookeeper. If the zookeeper update operation
> execute successfully on the server but throws connection loss on the
> client, and
> we have to retry on the connection loss exception, then the callback may
> be received
> a BadVersion exception. At this moment, the memory ledgers list is
> different from
> the zookeeper server. And that may cause some other issues on the broker.
> 
> We need to do some work on the metastore and managed ledger to keep the
> consistency between them. But that would change most of the callback of the
> meta store to handle it.
> 
> I want to know more ideas from yours. WDYT?
> 
> Regards,
> Yong
> 

Reply via email to