Thanks for your explanation , I will go through this process, to check how to avoid replicating ledger even when copy process gets started.
发自我的iPhone ------------------ Original ------------------ From: Matteo Merli <mme...@apache.org> Date: Sat,Aug 28,2021 10:57 PM To: Bookkeeper-Dev <dev@bookkeeper.apache.org> Subject: Re: Skip replicating ledger after bookkeeper server starts up again This is already happening, to a certain extent. When the auto-recovery workers are starting to replicate a ledger that got marked as under-replicated, they will first check every fragment of the ledger and will perform a test to decide whether it really needs to get replicated. There's a configurable threshold of a percentage of entries to try to read from each bookie on the ensemble. If all the replicas can be read, then the ledger is skipped. By default, we only read the first and last entry on each fragment. If these are good on all the bookies, then we skip. The gotcha here is that this check is done before starting replicating a ledger. If the copy process gets started, it will then continue until done. A potential improvement here would be to even break the copy of 1 ledger. Matteo -- Matteo Merli <mme...@apache.org> On Sat, Aug 28, 2021 at 6:47 AM zhangao <gaozhangmin...@qq.com.invalid> wrote: > > Maybe the idea remove ledger from underreplicated&nbsp; when&nbsp; bookie starts up has some problem, What important is, replicate legder puts so much heavy loads on cluster. I want to reduce the affect. > > > > > suppose there are 5 copies and 4 bookies are lost, when one lost bookie is online again,&nbsp; I don't simply mark the ledger replicated.&nbsp;&nbsp;When the replicateWork gets the failed bookies, it will&nbsp; ignore the one which online again.&nbsp; So, this will reduce replicating one copy of data. > > > > > > > ------------------&nbsp;原始邮件&nbsp;------------------ > 发件人: "dev" <eolive...@gmail.com&gt;; > 发送时间:&nbsp;2021年8月28日(星期六) 晚上6:07 > 收件人:&nbsp;"Bookkeeper-Dev"<dev@bookkeeper.apache.org&gt;; > > 主题:&nbsp;Re: Skip replicating ledger after bookkeeper server starts up again > > > > Candy Rain, > Thanks for sharing your proposal > > Il Ven 27 Ago 2021, 04:30 Candy Rain <gaozhangmin...@qq.com.invalid&gt; ha > scritto: > > &gt; Describe > &gt; > &gt; When&amp;nbsp; bookies in the cluster are down, the auto-recovery gets > &gt; triggered, but as these come back online, the rereplication worker should > &gt; Ideally skip the rereplication of the ledgers that are marked as > &gt; underreplicated. > > > The replicator should be able to listen on zookeeper for changes I'm the > availability of a bookie. > Probably there is some space for improvements here. > > I don't think it is a good idea to eagerly set the ledger as no more > underreplicated as soon as one of the bookie in the ensemble comes back > only. > The fact that a ledger is underreplicated is to be verified against the > requested number of replicas, as it is not enough that one bookie is running > > Regards > Enrico > > > But the ledgers are rereplicated instead > &gt; > &gt; > &gt; > &gt; Expected behavior > &gt; > &gt; Ideally as and when the bookies come up the ledgers marked as > &gt; underreplicated would be read by the rereplicaton worker and from the > &gt; metadata the worked should skip these as the bookies are available.