Re: Replication Error in HBase Production Environment

Hamado Dene Mon, 16 Sep 2024 06:57:32 -0700

Thanks for your response.
If I try to read the WALs with the following command:
hbase org.apache.hadoop.hbase.wal.WALPrettyPrinter 
/hbase/oldWALs/rzv-db13-hd.xxxx%2C16020%2C1684871532555.1696811057371
I don't get any error... The file seems to be read correctly. In fact, at the 
end of the reading, something like the following is printed:


cell total size sum: 136edit heap size: 312position: 15007544```"


Thanks, 

    Il lunedì 16 settembre 2024 alle ore 14:51:02 CEST, 张铎(Duo Zhang) 
<palomino...@gmail.com> ha scritto:  
 
 Have you tried to read these WAL files by WALPrettyPrinter? What is
the error from WALPrettyPrinter while reading these files?

Hamado Dene <hamadod...@yahoo.com.invalid> 于2024年9月16日周一 16:15写道：
>
> Checking the WALs on HDFS, there are very old WALs, from a year ago... Does 
> anyone have any idea how to handle this issue in production?
>
> -rw-r--r--  2 hbase hadoop  20684288 2023-10-09 08:26 
> /hbase/oldWALs/rzv-db14-hd.xxxx%2C16020%2C1674973593505.1696810047993
> -rw-r--r--  2 hbase hadoop  15007744 2023-10-09 08:26 
> /hbase/oldWALs/rzv-db13-hd.xxxx%2C16020%2C1684871532555.1696811057371
> -rw-r--r--  2 hbase hadoop      15872 2023-10-09 08:26 
> /hbase/oldWALs/rzv-db12-hd.xxxx%2C16020%2C1674973371058.1696813278286
> -rw-r--r--  2 hbase hadoop  42594304 2023-10-09 08:27 
> /hbase/oldWALs/rzv-db09-hd.xxxx%2C16020%2C1674973354605.1696810476448-rw-r--r--
>   2 hbase hadoop  13622784 2023-10-09 08:26 
> /hbase/oldWALs/rzv-db10-hd.xxxx%2C16020%2C1674973984596.1696810895708
>    Il giovedì 12 settembre 2024 alle ore 09:30:46 CEST, Hamado Dene 
><hamadod...@yahoo.com> ha scritto:
>
>  Hi community,Could anyone kindly assist me in resolving this issue I'm 
>facing?
> Thank you in advance!
> Hamado Dene
>    Il mercoledì 11 settembre 2024 alle ore 16:26:55 CEST, Hamado Dene 
><hamadod...@yahoo.com> ha scritto:
>
>  Hi HBase Community,
> We are currently facing an issue in our production environment with HBase 
> replication, and I would greatly appreciate any guidance or suggestions the 
> community may have
>
> We are running HBase version 2.5.8, and in the logs, we consistently 
> encounter the following warning:
>
>
>
> 024-09-11T15:51:11,468 WARN  
> [RS_CLAIM_REPLICATION_QUEUE-regionserver/rzv-db09-hd:16020-0.replicationSource,replicav3-rzv-db13-hd.xxxx,16020,1684871532555-rzv-db09-hd.xxxx,16020,1696832789107-rzv-db09-hd.xxxx,16020,1696833033289-rzv-db13-hd.xxxx,16020,1722636062425-rzv-db13-hd.xxxx,16020,1722636803794-rzv-db12-hd.xxxx,16020,1722636800268.replicationSource.wal-reader.rzv-db13-hd.xxxx%2C16020%2C1684871532555,replicav3-rzv-db13-hd.xxxx,16020,1684871532555-rzv-db09-hd.xxxx,16020,1696832789107-rzv-db09-hd.xxxx,16020,1696833033289-rzv-db13-hd.xxxx,16020,1722636062425-rzv-db13-hd.xxxx,16020,1722636803794-rzv-db12-hd.xxxx,16020,1722636800268]
>  regionserver.ReplicationSourceWALReader: Failed to read stream of 
> replication entriesjava.io.EOFException: Cannot seek after EOF        at 
> org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1682) 
> ~[hadoop-hdfs-client-2.10.2.jar:?]        at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:66) 
> ~[hadoop-common-2.10.2.jar:?]        at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.seekOnFs(ProtobufLogReader.java:527)
>  ~[hbase-server-2.5.8.jar:2.5.8]        at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.seek(ReaderBase.java:130) 
> ~[hbase-server-2.5.8.jar:2.5.8]        at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.seek(WALEntryStream.java:408)
>  ~[hbase-server-2.5.8.jar:2.5.8]        at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:339)
>  ~[hbase-server-2.5.8.jar:2.5.8]        at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:308)
>  ~[hbase-server-2.5.8.jar:2.5.8]        at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:298)
>  ~[hbase-server-2.5.8.jar:2.5.8]        at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:172)
>  ~[hbase-server-2.5.8.jar:2.5.8]        at 
> org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:102)
>  ~[hbase-server-2.5.8.jar:2.5.8]        at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.tryAdvanceStreamAndCreateWALBatch(ReplicationSourceWALReader.java:258)
>  ~[hbase-server-2.5.8.jar:2.5.8]        at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:145)
>  ~[hbase-server-2.5.8.jar:2.5.8]
>
>
> This error appears to stem from the replication WAL reader, and the "Cannot 
> seek after EOF" message suggests a failure to read the replication entries. 
> We suspect this may be affecting the replication flow between our region 
> servers.
>
> Has anyone encountered this problem before, or does anyone have insights into 
> potential causes and solutions?
>
>
> Thank you in advance for your assistance!
>
> Hamado Dene

Re: Replication Error in HBase Production Environment

Reply via email to