On Wed, Jul 3, 2013 at 8:12 AM, Colin McCabe <cmcc...@alumni.cmu.edu> wrote:
> On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas <sur...@hortonworks.com> > wrote: > > Dave, > > > > Thanks for the detailed email. Sorry I did not read all the details you > had > > sent earlier completely (on my phone). As you said, this is not related > to > > data loss related to HBase log and hsync. I think you are right; the > rename > > operation itself might not have hit the disk. I think we should either > > ensure metadata operation is synced on the datanode or handle it being > > reported as blockBeingWritten. Let me spend sometime to debug this issue. > > In theory, ext3 is journaled, so all metadata operations should be > durable in the case of a power outage. It is only data operations > that should be possible to lose. It is the same for ext4. (Assuming > you are not using nonstandard mount options.) > ext3 journal may not hit the disk right. From what I read, if you do not specifically call sync, even the metadata operations do not hit disk. See - https://www.kernel.org/doc/Documentation/filesystems/ext3.txt commit=nrsec (*) Ext3 can be told to sync all its data and metadata every 'nrsec' seconds. The default value is 5 seconds. This means that if you lose your power, you will lose as much as the latest 5 seconds of work (your filesystem will not be damaged though, thanks to the journaling). This default value (or any low value) will hurt performance, but it's good for data-safety. Setting it to 0 will have the same effect as leaving it at the default (5 seconds). Setting it to very large values will improve performance.