Thanks for your response. If I try to read the WALs with the following command: hbase org.apache.hadoop.hbase.wal.WALPrettyPrinter /hbase/oldWALs/rzv-db13-hd.xxxx%2C16020%2C1684871532555.1696811057371 I don't get any error... The file seems to be read correctly. In fact, at the end of the reading, something like the following is printed:
cell total size sum: 136edit heap size: 312position: 15007544```" Thanks, Il lunedì 16 settembre 2024 alle ore 14:51:02 CEST, 张铎(Duo Zhang) <palomino...@gmail.com> ha scritto: Have you tried to read these WAL files by WALPrettyPrinter? What is the error from WALPrettyPrinter while reading these files? Hamado Dene <hamadod...@yahoo.com.invalid> 于2024年9月16日周一 16:15写道: > > Checking the WALs on HDFS, there are very old WALs, from a year ago... Does > anyone have any idea how to handle this issue in production? > > -rw-r--r-- 2 hbase hadoop 20684288 2023-10-09 08:26 > /hbase/oldWALs/rzv-db14-hd.xxxx%2C16020%2C1674973593505.1696810047993 > -rw-r--r-- 2 hbase hadoop 15007744 2023-10-09 08:26 > /hbase/oldWALs/rzv-db13-hd.xxxx%2C16020%2C1684871532555.1696811057371 > -rw-r--r-- 2 hbase hadoop 15872 2023-10-09 08:26 > /hbase/oldWALs/rzv-db12-hd.xxxx%2C16020%2C1674973371058.1696813278286 > -rw-r--r-- 2 hbase hadoop 42594304 2023-10-09 08:27 > /hbase/oldWALs/rzv-db09-hd.xxxx%2C16020%2C1674973354605.1696810476448-rw-r--r-- > 2 hbase hadoop 13622784 2023-10-09 08:26 > /hbase/oldWALs/rzv-db10-hd.xxxx%2C16020%2C1674973984596.1696810895708 > Il giovedì 12 settembre 2024 alle ore 09:30:46 CEST, Hamado Dene ><hamadod...@yahoo.com> ha scritto: > > Hi community,Could anyone kindly assist me in resolving this issue I'm >facing? > Thank you in advance! > Hamado Dene > Il mercoledì 11 settembre 2024 alle ore 16:26:55 CEST, Hamado Dene ><hamadod...@yahoo.com> ha scritto: > > Hi HBase Community, > We are currently facing an issue in our production environment with HBase > replication, and I would greatly appreciate any guidance or suggestions the > community may have > > We are running HBase version 2.5.8, and in the logs, we consistently > encounter the following warning: > > > > 024-09-11T15:51:11,468 WARN > [RS_CLAIM_REPLICATION_QUEUE-regionserver/rzv-db09-hd:16020-0.replicationSource,replicav3-rzv-db13-hd.xxxx,16020,1684871532555-rzv-db09-hd.xxxx,16020,1696832789107-rzv-db09-hd.xxxx,16020,1696833033289-rzv-db13-hd.xxxx,16020,1722636062425-rzv-db13-hd.xxxx,16020,1722636803794-rzv-db12-hd.xxxx,16020,1722636800268.replicationSource.wal-reader.rzv-db13-hd.xxxx%2C16020%2C1684871532555,replicav3-rzv-db13-hd.xxxx,16020,1684871532555-rzv-db09-hd.xxxx,16020,1696832789107-rzv-db09-hd.xxxx,16020,1696833033289-rzv-db13-hd.xxxx,16020,1722636062425-rzv-db13-hd.xxxx,16020,1722636803794-rzv-db12-hd.xxxx,16020,1722636800268] > regionserver.ReplicationSourceWALReader: Failed to read stream of > replication entriesjava.io.EOFException: Cannot seek after EOF at > org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1682) > ~[hadoop-hdfs-client-2.10.2.jar:?] at > org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:66) > ~[hadoop-common-2.10.2.jar:?] at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.seekOnFs(ProtobufLogReader.java:527) > ~[hbase-server-2.5.8.jar:2.5.8] at > org.apache.hadoop.hbase.regionserver.wal.ReaderBase.seek(ReaderBase.java:130) > ~[hbase-server-2.5.8.jar:2.5.8] at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.seek(WALEntryStream.java:408) > ~[hbase-server-2.5.8.jar:2.5.8] at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:339) > ~[hbase-server-2.5.8.jar:2.5.8] at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:308) > ~[hbase-server-2.5.8.jar:2.5.8] at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:298) > ~[hbase-server-2.5.8.jar:2.5.8] at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:172) > ~[hbase-server-2.5.8.jar:2.5.8] at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:102) > ~[hbase-server-2.5.8.jar:2.5.8] at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.tryAdvanceStreamAndCreateWALBatch(ReplicationSourceWALReader.java:258) > ~[hbase-server-2.5.8.jar:2.5.8] at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:145) > ~[hbase-server-2.5.8.jar:2.5.8] > > > This error appears to stem from the replication WAL reader, and the "Cannot > seek after EOF" message suggests a failure to read the replication entries. > We suspect this may be affecting the replication flow between our region > servers. > > Has anyone encountered this problem before, or does anyone have insights into > potential causes and solutions? > > > Thank you in advance for your assistance! > > Hamado Dene