Thanks Todd. Finally we also started suspecting in that angle. Planned to take the file details before reboot and after reboot. With the above analysis i can confirm, whether the same issue or not.
One more thing to notice is that the difference between reboot time and last replica finalization is ~1hr in some cases. Since the machine is rebooted due to kernal.hung_task_timeout_secs , in OS also that particular thread might not got the chance to sync the data. great one, HDFS-1539, I have merged all the bugs. Since this is an improvement, issue might not come to my list :( . Also found some OS level configs to do the filesystem operations synchronously dirsync All directory updates within the filesystem should be done synchronously. This affects the following system calls: creat, link, unlink, symlink, mkdir, rmdir, mknod and rename. We suspected mainly the rename operation lost after reboot. Since metafile , blockfile rename should happen when finalizing the block from BBW to current. ( at least not considered blocksize). Anyway, thanks a lot for your great & valuable time with us here. After checking the above OS logs, i will have a run with HDFS-1539. Regards, Uma ________________________________________ From: Todd Lipcon [t...@cloudera.com] Sent: Thursday, November 24, 2011 5:07 AM To: common-...@hadoop.apache.org Cc: hdfs-dev@hadoop.apache.org Subject: Re: Blocks are getting corrupted under very high load On Wed, Nov 23, 2011 at 1:23 AM, Uma Maheswara Rao G <mahesw...@huawei.com> wrote: > Yes, Todd, block after restart is small and genstamp also lesser. > Here complete machine reboot happend. The boards are configured like, if it > is not getting any CPU cycles for 480secs, it will reboot himself. > kernal.hung_task_timeout_secs = 480 sec. So sounds like the following happened: - while writing file, the pipeline got reduced down to 1 node due to timeouts from the other two - soon thereafter (before more replicas were made), that last replica kernel-paniced without syncing the data - on reboot, the filesystem lost some edits from its ext3 journal, and the block got moved back into the RBW directly, with truncated data - hdfs did "the right thing" - at least what the algorithms say it should do, because it had gotten a commitment for a later replica If you have a build which includes HDFS-1539, you could consider setting dfs.datanode.synconclose to true, which would have prevented this problem. -Todd -- Todd Lipcon Software Engineer, Cloudera