[Update] RE: Blocks are getting corrupted under very high load

Uma Maheswara Rao G Fri, 25 Nov 2011 19:33:03 -0800

>_______________________________________
>From: Uma Maheswara Rao G
>Sent: Thursday, November 24, 2011 7:51 AM
>To: hdfs-dev@hadoop.apache.org; common-...@hadoop.apache.org
>Subject: RE: Blocks are getting corrupted under very high load


We could replicate the issue with some test code ( with out hadoop). Issue 
looks to be same as you pointed.

>Thanks Todd.

>Finally we also started suspecting in that angle. Planned to take the file 
>details before reboot and after reboot.
>With the above analysis i can confirm, whether the same issue or not.

Logs before reboot can not get because that logs are loosing as well :) . this 
is also a proof.

>One more thing to notice is that the difference between reboot time and last 
>replica finalization is ~1hr in some >cases.
>Since the machine is rebooted due to kernal.hung_task_timeout_secs , in OS 
>also that particular thread might not >got the chance to sync the data.
 
same cause.

>great one, HDFS-1539, I have merged all the bugs. Since this is an 
>improvement, issue might not come to my list >:( .

>Also found some OS level configs to do the filesystem operations synchronously
>dirsync
 >   All directory updates within the filesystem should be done synchronously. 
 > This affects the following system >calls: creat, link, unlink, symlink, 
 > mkdir, rmdir, mknod and rename.
>We suspected mainly the rename operation lost after reboot. Since metafile , 
>blockfile rename should happen >when finalizing the block from 
>BlocksBeingWritten to current. ( at least not considered blocksize).

After the test, we found major performance hit.

>Anyway, thanks a lot for your great & valuable  time  with us here. After 
>checking the above OS logs, i will have a >run with HDFS-1539.

Also performance hit. Presently we are planning to tune the client app for the 
less threads to reduce the load on OS and also data xceiver count at DN ( 
currently count is 4096 as Hbase team suggests). Obviously the problem should 
be rectified.

>Regards,
>Uma

________________________________________
From: Todd Lipcon [t...@cloudera.com]
Sent: Thursday, November 24, 2011 5:07 AM
To: common-...@hadoop.apache.org
Cc: hdfs-dev@hadoop.apache.org
Subject: Re: Blocks are getting corrupted under very high load

On Wed, Nov 23, 2011 at 1:23 AM, Uma Maheswara Rao G
<mahesw...@huawei.com> wrote:
> Yes, Todd,  block after restart is small and  genstamp also lesser.
>   Here complete machine reboot happend. The boards are configured like, if it 
> is not getting any CPU cycles  for 480secs, it will reboot himself.
>  kernal.hung_task_timeout_secs = 480 sec.

So sounds like the following happened:
- while writing file, the pipeline got reduced down to 1 node due to
timeouts from the other two
- soon thereafter (before more replicas were made), that last replica
kernel-paniced without syncing the data
- on reboot, the filesystem lost some edits from its ext3 journal, and
the block got moved back into the RBW directly, with truncated data
- hdfs did "the right thing" - at least what the algorithms say it
should do, because it had gotten a commitment for a later replica

If you have a build which includes HDFS-1539, you could consider
setting dfs.datanode.synconclose to true, which would have prevented
this problem.

-Todd
--
Todd Lipcon
Software Engineer, Cloudera

[Update] RE: Blocks are getting corrupted under very high load

Reply via email to