[
https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512915#comment-16512915
]
Josh Elser edited comment on HBASE-20723 at 6/14/18 7:53 PM:
-------------------------------------------------------------
[[email protected]], have you tried writing a simple minicluster test to
reproduce the issue? My understanding is that we should be able to easily
trigger this in a contrived local case. My thinking is that a high-level test
should make this crystal-clear as to the issue for folks.
* Create one RS minicluster
* Configure store files on mini HDFS
* Configure WALs on local filesystem
* Write some edits to a table (small, to avoid possible flush)
* Restart RS
* Observe edits missing from our table
Seems like this went unnoticed due to differences in recovery semantics from
fshlog to multiwal? Is this a guess or are we sure of this?
was (Author: elserj):
[[email protected]], have you tried writing a simple minicluster test to
reproduce the issue? My understanding is that we should be able to easily
trigger this in a contrived local case.
* Create one RS minicluster
* Configure store files on mini HDFS
* Configure WALs on local filesystem
* Write some edits to a table (small, to avoid possible flush)
* Restart RS
* Observe edits missing from our table
Seems like this went unnoticed due to differences in recovery semantics from
fshlog to multiwal? Is this a guess or are we sure of this?
> WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
> -------------------------------------------------------------------------
>
> Key: HBASE-20723
> URL: https://issues.apache.org/jira/browse/HBASE-20723
> Project: HBase
> Issue Type: Bug
> Components: hbase
> Affects Versions: 1.1.2
> Reporter: Rohan Pednekar
> Assignee: Ted Yu
> Priority: Major
> Attachments: 20723.v1.txt, 20723.v2.txt, 20723.v3.txt, 20723.v4.txt,
> 20723.v5.txt, 20723.v5.txt, 20723.v6.txt, logs.zip
>
>
> This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase
> 1.1.2.2.6.3.2-14
> By default the underlying data is going to wasb://xxxxx@yyyyy/hbase
> I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at
> /mnt.
> hbase.wal.dir= hdfs://mycluster/walontest
> hbase.wal.dir.perms=700
> hbase.rootdir.perms=700
> hbase.rootdir=
> wasb://XYZ[@hbaseperf.core.net|mailto:[email protected]]/hbase
> Procedure to reproduce this issue:
> 1. create a table in hbase shell
> 2. insert a row in hbase shell
> 3. reboot the VM which hosts that region
> 4. scan the table in hbase shell and it is empty
> Looking at the region server logs:
> {code:java}
> 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1]
> wal.WALSplitter: This region's directory doesn't exist:
> hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648.
> It is very likely that it was already split so it's safe to discard those
> edits.
> {code}
> The log split/replay ignored actual WAL due to WALSplitter is looking for the
> region directory in the hbase.wal.dir we specified rather than the
> hbase.rootdir.
> Looking at the source code,
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java
> it uses the rootDir, which is walDir, as the tableDir root path.
> So if we use HBASE-17437, waldir and hbase rootdir are in different path or
> even in different filesystem, then the #5 uses walDir as tableDir is
> apparently wrong.
> CC: [~zyork], [[email protected]] Attached the logs for quick review.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)