[ 
https://issues.apache.org/jira/browse/HBASE-26843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-26843.
-------------------------------
    Fix Version/s:     (was: 3.0.0-beta-2)
       Resolution: Cannot Reproduce

We have done some fixes around this area. Resolve as can not reproduce.

> When encounters EOF exception, replication log will not be dequeued
> -------------------------------------------------------------------
>
>                 Key: HBASE-26843
>                 URL: https://issues.apache.org/jira/browse/HBASE-26843
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.4.10
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>
> We met this problem on our production clusters, and in the pre-commit tests, 
> it also occured. 
> We can see this test logs,
>  
> {code:java}
> 2022-03-14T15:31:16,225 INFO  [Time-limited test] wal.AbstractFSWAL(840): 
> Rolled WAL 
> /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4219/yetus-jdk8-hadoop3-check/src/hbase-server/target/test-data/b3808e70-5b17-10bb-3bd3-fdd825fd8fab/WALs/testSizeOfLogQueue/testSizeOfLogQueue.1647271876165
>  with entries=1, filesize=210 B; new WAL 
> /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4219/yetus-jdk8-hadoop3-check/src/hbase-server/target/test-data/b3808e70-5b17-10bb-3bd3-fdd825fd8fab/WALs/testSizeOfLogQueue/testSizeOfLogQueue.1647271876191
> 2022-03-14T15:31:16,225 DEBUG [Time-limited test] wal.AbstractFSWAL(927): 
> Create new AsyncFSWAL writer with pipeline: 
> [DatanodeInfoWithStorage[127.0.0.1:34961,DS-77496ab4-d8f2-4bee-af67-6f1c3e08de20,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:38455,DS-50a46a7f-6202-4177-a91f-7da535dbf07f,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:40111,DS-d41b66a9-f9d5-4a57-9969-ac93cae6c9ac,DISK]]
> 2022-03-14T15:31:16,338 WARN  [Close-WAL-Writer-0] 
> asyncfs.FanOutOneBlockAsyncDFSOutputHelper(593): complete file 
> /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4219/yetus-jdk8-hadoop3-check/src/hbase-server/target/test-data/b3808e70-5b17-10bb-3bd3-fdd825fd8fab/WALs/testSizeOfLogQueue/testSizeOfLogQueue.1647271876165
>  not finished, retry = 0
> 2022-03-14T15:31:16,357 DEBUG [Time-limited test] wal.ProtobufLogReader(456): 
> Encountered a malformed edit, seeking back to last good position in file, 
> from 211 to 210
> java.io.EOFException: Partial PB while reading WAL, probably an unexpected 
> EOF, ignoring. current offset=211
>       at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:399)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:105) 
> ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:93) 
> ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:264)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:178)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:103)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.replication.regionserver.TestBasicWALEntryStream.testSizeOfLogQueue(TestBasicWALEntryStream.java:698)
>  ~[test-classes/:?]
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_282]
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_282]
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_282]
>       at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_282]
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 
> ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 
> ~[junit-4.13.2.jar:4.13.2]
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 
> ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38) 
> ~[hbase-common-3.0.0-alpha-3-SNAPSHOT-tests.jar:3.0.0-alpha-3-SNAPSHOT]
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  ~[junit-4.13.2.jar:4.13.2]
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  ~[junit-4.13.2.jar:4.13.2]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_282]
>       at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282]
> 2022-03-14T15:31:16,511 DEBUG [Time-limited test] wal.AbstractFSWAL(1050): 
> Moved 2 WAL file(s) to 
> /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4219/yetus-jdk8-hadoop3-check/src/hbase-server/target/test-data/b3808e70-5b17-10bb-3bd3-fdd825fd8fab/oldWALs
> 2022-03-14T15:31:16,511 INFO  [Time-limited test] wal.AbstractFSWAL(1053): 
> Closed WAL: AsyncFSWAL testSizeOfLogQueue:(num 1647271876191)
>  {code}
> And this is the failure info,
>  
>  
> {code:java}
> Failedprecommit checks / yetus jdk8 Hadoop3 checks / 
> org.apache.hadoop.hbase.replication.regionserver.TestBasicWALEntryStreamFSHLog.testSizeOfLogQueueFailing
>  for the past 1 build (Since #1 )运行时间:0.41 secError Messageexpected:<1> but 
> was:<2>Stacktracejava.lang.AssertionError: expected:<1> but was:<2>
>       at org.junit.Assert.fail(Assert.java:89)
>       at org.junit.Assert.failNotEquals(Assert.java:835)
>       at org.junit.Assert.assertEquals(Assert.java:647)
>       at org.junit.Assert.assertEquals(Assert.java:633)
>       at 
> org.apache.hadoop.hbase.replication.regionserver.TestBasicWALEntryStream.testSizeOfLogQueue(TestBasicWALEntryStream.java:701)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>       at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>       at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  {code}
> The link is 
> [https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4219/1/testReport/org.apache.hadoop.hbase.replication.regionserver/TestBasicWALEntryStreamFSHLog/precommit_checks___yetus_jdk8_Hadoop3_checks___testSizeOfLogQueue/]
>  
>  
> I can not figure out what the problem is yet, but we can change the UT to 
> find more info.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to