[
https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511644#comment-16511644
]
Sean Busbey edited comment on HBASE-20723 at 6/13/18 8:49 PM:
--------------------------------------------------------------
{quote}
or the hflush in DFSOutputStream using by WAL's ProtobufLogWriter AFA I
understand is that it's writing blocks/packets to HDFS but not a complete WAL
file, where those sent blocks/packets is a group of writes that have not been
combined into a single file before WAL is being closed(). (let me know if I'm
wrong)
{quote}
it's a group of writes that hflush promises us is in memory at all block
replica DataNodes. it's true that the DataNode might not persist to disk yet,
but unless all the nodes in the pipeline die the stream is supposed to be
recoverable up to the point of the flush. This is one of the foundational
blocks of HBase being a consistent datastore.
{quote}
So, I found this problem when testing HBase on S3 with a 3-nodes cluster and
setting WAL on HDFS, wrote a hbase-client to sequentially write N (100k)
records (which key and value are both number #1 to #N), terminate the assigned
region server by `kill -9 $pid` and restart it. those writing region(s) will be
reassigned to another region server in few seconds, the client program
completes w/o errors but when verifying the records, few records were missing.
{quote}
This sounds like a dataloss bug. Is it easily reproducible? Does it show up
using e.g. ITBLL with the region server killing chaos monkey?
Only 3 nodes in the cluster means that if we have block replication set to 3
then we can't avoid having a local block. It's not ideal, but it shouldn't
cause dataloss if we aren't losing the other two. Can you confirm block
replication is set to >= 3 in HDFS?
Is the client making sure it got a success on the write before moving on to the
next entry?
Can we get more details on specific versions?
was (Author: busbey):
{quote}
or the hflush in DFSOutputStream using by WAL's ProtobufLogWriter AFA I
understand is that it's writing blocks/packets to HDFS but not a complete WAL
file, where those sent blocks/packets is a group of writes that have not been
combined into a single file before WAL is being closed(). (let me know if I'm
wrong)
{quote}
it's a group of writes that hflush promises us in in memory at all block
replica DataNodes. it's true that the DataNode might not persist to disk yet,
but unless all the nodes in the pipeline die the stream is supposed to be
recoverable up to the point of the flush. This is one of the foundational
blocks of HBase being a consistent datastore.
{quote}
So, I found this problem when testing HBase on S3 with a 3-nodes cluster and
setting WAL on HDFS, wrote a hbase-client to sequentially write N (100k)
records (which key and value are both number #1 to #N), terminate the assigned
region server by `kill -9 $pid` and restart it. those writing region(s) will be
reassigned to another region server in few seconds, the client program
completes w/o errors but when verifying the records, few records were missing.
{quote}
This sounds like a dataloss bug. Is it easily reproducible? Does it show up
using e.g. ITBLL with the region server killing chaos monkey?
Only 3 nodes in the cluster means that if we have block replication set to 3
then we can't avoid having a local block. It's not ideal, but it shouldn't
cause dataloss if we aren't losing the other two. Can you confirm replication
is set to >= 3?
Is the client making sure it got a success on the write before moving on to the
next entry?
Can we get more details on specific versions?
> WALSplitter uses the rootDir, which is walDir, as the tableDir root path.
> -------------------------------------------------------------------------
>
> Key: HBASE-20723
> URL: https://issues.apache.org/jira/browse/HBASE-20723
> Project: HBase
> Issue Type: Bug
> Components: hbase
> Affects Versions: 1.1.2
> Reporter: Rohan Pednekar
> Priority: Major
> Attachments: logs.zip
>
>
> This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase
> 1.1.2.2.6.3.2-14
> By default the underlying data is going to wasb://xxxxx@yyyyy/hbase
> I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at
> /mnt.
> hbase.wal.dir= hdfs://mycluster/walontest
> hbase.wal.dir.perms=700
> hbase.rootdir.perms=700
> hbase.rootdir=
> wasb://XYZ[@hbaseperf.core.net|mailto:[email protected]]/hbase
> Procedure to reproduce this issue:
> 1. create a table in hbase shell
> 2. insert a row in hbase shell
> 3. reboot the VM which hosts that region
> 4. scan the table in hbase shell and it is empty
> Looking at the region server logs:
> {code:java}
> 2018-06-12 22:08:40,455 INFO [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1]
> wal.WALSplitter: This region's directory doesn't exist:
> hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648.
> It is very likely that it was already split so it's safe to discard those
> edits.
> {code}
> The log split/replay ignored actual WAL due to WALSplitter is looking for the
> region directory in the hbase.wal.dir we specified rather than the
> hbase.rootdir.
> Looking at the source code,
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java
> it uses the rootDir, which is walDir, as the tableDir root path.
> So if we use HBASE-17437, waldir and hbase rootdir are in different path or
> even in different filesystem, then the #5 uses walDir as tableDir is
> apparently wrong.
> CC: [~zyork], [[email protected]] Attached the logs for quick review.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)