Thanks lot for you explanation.
sent from iPhone

------------------ Original ------------------
From: Jack Vanlightly <jvanligh...@splunk.com.INVALID&gt;
Date: Tue,Sep 14,2021 9:11 PM
To: dev <dev@bookkeeper.apache.org&gt;
Subject: Re: AutoRecovery failed replicate ledger , because, it would read lac 
from failed bookie



1.&nbsp;Any&nbsp;ledgers&nbsp;that&nbsp;are&nbsp;open,&nbsp;and&nbsp;whose&nbsp;current&nbsp;ensemble&nbsp;include&nbsp;that
downed&nbsp;bookie&nbsp;would&nbsp;be&nbsp;unrecoverable&nbsp;until&nbsp;it&nbsp;comes&nbsp;back&nbsp;online.&nbsp;If&nbsp;other
ledgers&nbsp;are&nbsp;recoverable&nbsp;that&nbsp;are&nbsp;hosted&nbsp;on&nbsp;that&nbsp;bookie&nbsp;then&nbsp;either&nbsp;the
ledgers&nbsp;are&nbsp;closed,&nbsp;or&nbsp;they&nbsp;are&nbsp;open&nbsp;(or&nbsp;in-recovery)&nbsp;but&nbsp;the&nbsp;last&nbsp;ensemble
does&nbsp;not&nbsp;include&nbsp;this&nbsp;bookie.&nbsp;The&nbsp;LAC&nbsp;read&nbsp;only&nbsp;gets&nbsp;sent&nbsp;to&nbsp;the&nbsp;bookies&nbsp;of
the&nbsp;current&nbsp;ensemble&nbsp;(and&nbsp;a&nbsp;ledger&nbsp;may&nbsp;have&nbsp;multiple&nbsp;ensembles&nbsp;if
ensemble&nbsp;changes&nbsp;occurred).
2.&nbsp;Look&nbsp;for&nbsp;"coverageSet.checkCovered()"

Jack

On&nbsp;Tue,&nbsp;Sep&nbsp;14,&nbsp;2021&nbsp;at&nbsp;3:00&nbsp;PM&nbsp;zhangao&nbsp;<gaozhangmin...@qq.com.invalid&gt;
wrote:

&gt;&nbsp;[&nbsp;External&nbsp;sender.&nbsp;Exercise&nbsp;caution.&nbsp;]
&gt;
&gt;&nbsp;my&nbsp;ack&nbsp;quorum&nbsp;is&nbsp;1,&nbsp;please&nbsp;let&nbsp;me&nbsp;explain&nbsp;my&nbsp;confusion:
&gt;&nbsp;1、when&nbsp;one&nbsp;bookie&nbsp;is&nbsp;down,&nbsp;as&nbsp;you&nbsp;said,&nbsp;why&nbsp;some&nbsp;ledgers&nbsp;can&nbsp;be&nbsp;replicated
&gt;&nbsp;successfully,&nbsp;but&nbsp;some&nbsp;cannot.
&gt;&nbsp;2、from&nbsp;the&nbsp;code&nbsp;below&nbsp;in&nbsp;PendingReadLacOp,&nbsp;i&nbsp;don't&nbsp;see&nbsp;any&nbsp;codes&nbsp;relation
&gt;&nbsp;to&nbsp;ack&nbsp;quorum&nbsp;when&nbsp;read&nbsp;lac.
&gt;
&gt;
&gt;&nbsp;public&nbsp;void&nbsp;initiate()&nbsp;{
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for&nbsp;(int&nbsp;i&nbsp;=&nbsp;0;&nbsp;i&nbsp;<&nbsp;currentEnsemble.size();&nbsp;i++)&nbsp;{
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;bookieClient.readLac(currentEnsemble.get(i),&nbsp;lh.ledgerId,&nbsp;this,&nbsp;i);
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}
&gt;&nbsp;}
&gt;
&gt;
&gt;&nbsp;------------------&nbsp;原始邮件&nbsp;------------------
&gt;&nbsp;发件人:
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"dev"
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<
&gt;&nbsp;jvanligh...@splunk.com.INVALID&amp;gt;;
&gt;&nbsp;发送时间:&amp;nbsp;2021年9月14日(星期二)&nbsp;晚上8:49
&gt;&nbsp;收件人:&amp;nbsp;"dev"<dev@bookkeeper.apache.org&amp;gt;;
&gt;
&gt;&nbsp;主题:&amp;nbsp;Re:&nbsp;AutoRecovery&nbsp;failed&nbsp;replicate&nbsp;ledger&nbsp;,&nbsp;because,&nbsp;it&nbsp;would&nbsp;read
&gt;&nbsp;lac&nbsp;from&nbsp;failed&nbsp;bookie
&gt;
&gt;
&gt;
&gt;&nbsp;An&nbsp;LAC&nbsp;read&nbsp;will&nbsp;fail&nbsp;in&nbsp;this&nbsp;way&nbsp;if&nbsp;Ack&nbsp;Quorum&nbsp;or&nbsp;more&nbsp;bookies&nbsp;respond
&gt;&nbsp;with&nbsp;any&nbsp;other&nbsp;than&nbsp;OK,&nbsp;NoSuchEntry,&nbsp;NoSuchLedger.
&gt;
&gt;&nbsp;What&nbsp;is&nbsp;your&nbsp;ack&nbsp;quorum?&nbsp;If&nbsp;it&nbsp;is&nbsp;just&nbsp;1&nbsp;(not&nbsp;a&nbsp;good&nbsp;setting),&nbsp;then&nbsp;a
&gt;&nbsp;single&nbsp;bookie&nbsp;being&nbsp;down&nbsp;will&nbsp;make&nbsp;the&nbsp;LAC&nbsp;read&nbsp;fail&nbsp;this&nbsp;way.&nbsp;If&nbsp;your&nbsp;ack
&gt;&nbsp;quorum&nbsp;is&nbsp;2,&nbsp;then&nbsp;2&nbsp;bookies&nbsp;being&nbsp;down&nbsp;will&nbsp;cause&nbsp;it&nbsp;etc.
&gt;
&gt;&nbsp;Jack
&gt;
&gt;&nbsp;On&nbsp;Tue,&nbsp;Sep&nbsp;14,&nbsp;2021&nbsp;at&nbsp;1:17&nbsp;PM&nbsp;zhangao&nbsp;<gaozhangmin...@qq.com.invalid&amp;gt;
&gt;&nbsp;wrote:
&gt;
&gt;&nbsp;&amp;gt;&nbsp;[&nbsp;External&nbsp;sender.&nbsp;Exercise&nbsp;caution.&nbsp;]
&gt;&nbsp;&amp;gt;
&gt;&nbsp;&amp;gt;&nbsp;As&nbsp;title,&nbsp;When&nbsp;bookie&nbsp;is&nbsp;lost,&nbsp;the&nbsp;ledger&nbsp;which&nbsp;state&nbsp;is&nbsp;open&nbsp;cannot
&gt;&nbsp;&amp;gt;&nbsp;replicated&nbsp;because&nbsp;of&nbsp;reading&nbsp;lac&nbsp;from&nbsp;failed&nbsp;bookie.
&gt;&nbsp;&amp;gt;&nbsp;it&nbsp;would&nbsp;failed&nbsp;read&nbsp;lac&nbsp;from&nbsp;failed&nbsp;bookie,&nbsp;because&nbsp;it&nbsp;cannot&nbsp;be
&gt;&nbsp;&amp;gt;&nbsp;connected.
&gt;&nbsp;&amp;gt;
&gt;&nbsp;&amp;gt;&nbsp;How&nbsp;bookkeeper&nbsp;auto&nbsp;recovery&nbsp;deal&nbsp;with&nbsp;open&nbsp;ledger&nbsp;in&nbsp;failed&nbsp;bookie&nbsp;?
&gt;&nbsp;&amp;gt;
&gt;&nbsp;&amp;gt;&nbsp;I&nbsp;don't&nbsp;know&nbsp;if&nbsp;it's&nbsp;a&nbsp;bug&nbsp;or&nbsp;not.
&gt;&nbsp;&amp;gt;
&gt;&nbsp;&amp;gt;&nbsp;The&nbsp;error&nbsp;log:
&gt;&nbsp;&amp;gt;
&gt;&nbsp;&amp;gt;&nbsp;12:29:57.072&nbsp;[main-EventThread]&nbsp;INFO&amp;amp;nbsp;
&gt;&nbsp;&amp;gt;&nbsp;org.apache.bookkeeper.client.DefaultBookieAddressResolver&nbsp;-&nbsp;Cannot
&gt;&nbsp;resolve
&gt;&nbsp;&amp;gt;&nbsp;x.x.x.x:3181,&nbsp;bookie&nbsp;is&nbsp;unknown
&gt;&nbsp;&amp;gt;
&gt;&nbsp;org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException:
&gt;&nbsp;&amp;gt;&nbsp;Bookie&nbsp;handle&nbsp;is&nbsp;not&nbsp;available
&gt;&nbsp;&amp;gt;
&gt;&nbsp;&amp;gt;&nbsp;12:29:57.072&nbsp;[main-EventThread]&nbsp;ERROR
&gt;&nbsp;&amp;gt;&nbsp;org.apache.bookkeeper.proto.PerChannelBookieClient&nbsp;-&nbsp;Cannot&nbsp;connect&nbsp;to
&gt;&nbsp;&amp;gt;&nbsp;x.x.x.x:3181&nbsp;as&nbsp;endpoint&nbsp;resolution&nbsp;failed&nbsp;(probably&nbsp;bookie&nbsp;is&nbsp;down)
&gt;&nbsp;err
&gt;&nbsp;&amp;gt;
&gt;&nbsp;org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
&gt;&nbsp;&amp;gt;&nbsp;Cannot&nbsp;resolve&nbsp;bookieId&nbsp;x.x.x.x:3181,&nbsp;bookie&nbsp;does&nbsp;not&nbsp;exist&nbsp;or&nbsp;it&nbsp;is
&gt;&nbsp;not
&gt;&nbsp;&amp;gt;&nbsp;running
&gt;&nbsp;&amp;gt;
&gt;&nbsp;&amp;gt;&nbsp;12:29:57.078&nbsp;[BookKeeperClientWorker-OrderedExecutor-29-0]
&gt;&nbsp;INFO&amp;amp;nbsp;
&gt;&nbsp;&amp;gt;&nbsp;org.apache.bookkeeper.client.PendingReadLacOp&nbsp;-&nbsp;While&nbsp;readLac&nbsp;ledger:
&gt;&nbsp;96789
&gt;&nbsp;&amp;gt;&nbsp;did&nbsp;not&nbsp;hear&nbsp;success&nbsp;responses&nbsp;from&nbsp;all&nbsp;of&nbsp;ensemble
&gt;&nbsp;&amp;gt;
&gt;&nbsp;&amp;gt;&nbsp;12:29:57.078&nbsp;[ReplicationWorker]&nbsp;INFO&amp;amp;nbsp;
&gt;&nbsp;&amp;gt;&nbsp;org.apache.bookkeeper.replication.ReplicationWorker&nbsp;-&nbsp;BKReadException
&gt;&nbsp;while
&gt;&nbsp;&amp;gt;&nbsp;rereplicating&nbsp;ledger&nbsp;96789.&nbsp;Enough&nbsp;Bookies&nbsp;might&nbsp;not&nbsp;have&nbsp;available
&gt;&nbsp;So,&nbsp;no
&gt;&nbsp;&amp;gt;&nbsp;harm&nbsp;to&nbsp;continue

Reply via email to