Re: data loss after cluster wide power loss

2013-07-08 Thread Colin McCabe
Thanks. Suresh and Kihwal are right-- renames are journalled, but not necessarily durable (stored to disk). I was getting mixed up with HDFS semantics, in which we actually do make the journal durable before returning success to the client. It might be a good idea for HDFS to fsync the file desc

Re: data loss after cluster wide power loss

2013-07-06 Thread Dave Latham
Thanks for the detailed information Michael, Colin, Suresh, Kiwhal. It looks like we're on CentOS 5.7 (kernel 2.6.18-274.el5) so if the fix was included in 5.4 then it sounds like we should have it. I don't believe we're using LVM. It sounds like HDFS could improve handling of this scenario by f

Re: data loss after cluster wide power loss

2013-07-03 Thread Kihwal Lee
For the ext3 bug Colin mentioned, see https://bugzilla.redhat.com/show_bug.cgi?id=592961. This was fixed in 2.6.32 and backported in RHEL 5.4 (or CENTOS). This has more to do with file data and affects NN more. Since NN preallocates blocks for edits, almost all data writes are done without modifyin

Re: data loss after cluster wide power loss

2013-07-03 Thread Suresh Srinivas
On Wed, Jul 3, 2013 at 8:12 AM, Colin McCabe wrote: > On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas > wrote: > > Dave, > > > > Thanks for the detailed email. Sorry I did not read all the details you > had > > sent earlier completely (on my phone). As you said, this is not related > to > > data

Re: data loss after cluster wide power loss

2013-07-03 Thread Colin McCabe
On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas wrote: > Dave, > > Thanks for the detailed email. Sorry I did not read all the details you had > sent earlier completely (on my phone). As you said, this is not related to > data loss related to HBase log and hsync. I think you are right; the rename

Re: data loss after cluster wide power loss

2013-07-02 Thread Dave Latham
lf Of Dave > Latham > Sent: 01 July 2013 16:08 > To: hdfs-u...@hadoop.apache.org > Cc: hdfs-dev@hadoop.apache.org > Subject: Re: data loss after cluster wide power loss > > Much appreciated, Suresh. Let me know if I can provide any more > information or if you'd like m

RE: data loss after cluster wide power loss

2013-07-02 Thread Uma Maheswara Rao G
Message- From: ddlat...@gmail.com [mailto:ddlat...@gmail.com] On Behalf Of Dave Latham Sent: 01 July 2013 16:08 To: hdfs-u...@hadoop.apache.org Cc: hdfs-dev@hadoop.apache.org Subject: Re: data loss after cluster wide power loss Much appreciated, Suresh. Let me know if I can provide any more

Re: data loss after cluster wide power loss

2013-07-01 Thread Dave Latham
Much appreciated, Suresh. Let me know if I can provide any more information or if you'd like me to open a JIRA. Dave On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas wrote: > Dave, > > Thanks for the detailed email. Sorry I did not read all the details you > had sent earlier completely (on my p

Re: data loss after cluster wide power loss

2013-07-01 Thread Suresh Srinivas
Dave, Thanks for the detailed email. Sorry I did not read all the details you had sent earlier completely (on my phone). As you said, this is not related to data loss related to HBase log and hsync. I think you are right; the rename operation itself might not have hit the disk. I think we should e