>_______________________________________ >From: Uma Maheswara Rao G >Sent: Thursday, November 24, 2011 7:51 AM >To: hdfs-dev@hadoop.apache.org; common-...@hadoop.apache.org >Subject: RE: Blocks are getting corrupted under very high load
We could replicate the issue with some test code ( with out hadoop). Issue looks to be same as you pointed. >Thanks Todd. >Finally we also started suspecting in that angle. Planned to take the file >details before reboot and after reboot. >With the above analysis i can confirm, whether the same issue or not. Logs before reboot can not get because that logs are loosing as well :) . this is also a proof. >One more thing to notice is that the difference between reboot time and last >replica finalization is ~1hr in some >cases. >Since the machine is rebooted due to kernal.hung_task_timeout_secs , in OS >also that particular thread might not >got the chance to sync the data. same cause. >great one, HDFS-1539, I have merged all the bugs. Since this is an >improvement, issue might not come to my list >:( . >Also found some OS level configs to do the filesystem operations synchronously >dirsync > All directory updates within the filesystem should be done synchronously. > This affects the following system >calls: creat, link, unlink, symlink, > mkdir, rmdir, mknod and rename. >We suspected mainly the rename operation lost after reboot. Since metafile , >blockfile rename should happen >when finalizing the block from >BlocksBeingWritten to current. ( at least not considered blocksize). After the test, we found major performance hit. >Anyway, thanks a lot for your great & valuable time with us here. After >checking the above OS logs, i will have a >run with HDFS-1539. Also performance hit. Presently we are planning to tune the client app for the less threads to reduce the load on OS and also data xceiver count at DN ( currently count is 4096 as Hbase team suggests). Obviously the problem should be rectified. >Regards, >Uma ________________________________________ From: Todd Lipcon [t...@cloudera.com] Sent: Thursday, November 24, 2011 5:07 AM To: common-...@hadoop.apache.org Cc: hdfs-dev@hadoop.apache.org Subject: Re: Blocks are getting corrupted under very high load On Wed, Nov 23, 2011 at 1:23 AM, Uma Maheswara Rao G <mahesw...@huawei.com> wrote: > Yes, Todd, block after restart is small and genstamp also lesser. > Here complete machine reboot happend. The boards are configured like, if it > is not getting any CPU cycles for 480secs, it will reboot himself. > kernal.hung_task_timeout_secs = 480 sec. So sounds like the following happened: - while writing file, the pipeline got reduced down to 1 node due to timeouts from the other two - soon thereafter (before more replicas were made), that last replica kernel-paniced without syncing the data - on reboot, the filesystem lost some edits from its ext3 journal, and the block got moved back into the RBW directly, with truncated data - hdfs did "the right thing" - at least what the algorithms say it should do, because it had gotten a commitment for a later replica If you have a build which includes HDFS-1539, you could consider setting dfs.datanode.synconclose to true, which would have prevented this problem. -Todd -- Todd Lipcon Software Engineer, Cloudera