Thanks Jonathan, I tried to reproduce this yesterday using a single dfs.name.dir, but I'll give it a go again with multiple.
Will let you know what I turn up. -Todd On Tue, Feb 9, 2010 at 12:05 PM, Allen, Jonathan <jonathan.all...@hp.com> wrote: > Todd, > > Unfortunately my test system is air gapped away from the internet so I > haven't been able to transfer my test case across yet but the basic steps as > are follows: > > 1) start-dfs (also shutdown the secondary to make sure that it didn't > checkpoint away the edit log) > 2) create lots of small files so that there is a large edit log (I created > about 4,500 files resulting in an edit log of just over 1MB). > 3) stop-dfs > 4) start-dfs > 5) wait for name node to start reading the edit log but not long enough for > it to finish reading it (I waited for a couple of seconds). > 6) stop-dfs > 7) start-dfs > 8) listing the hdfs directory now shows it in the same state as at step (1) > rather than the correct state as at step (3). > > This was running with the Yahoo distro of 0.20.1. > > The dfs.name.dir is configured to use directories on 2 local drives and 1 NFS > mounted drive. > > Thanks, > Jonathan > > Jonathan Allen > UKGP, NS&R, Defence and Security > HP Enterprise Services > Telephone +44 1682 292101 > Email jonathan.allen...@hp.com > Street address, Unit 29, Alexandra Way, Ashchurch Business Park, Tewkesbury, > Gloucestershire. GL20 8NB > > Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 > 1HN > Registered No: 690597 England > The contents of this message and any attachments to it are confidential and > may be legally privileged. If you have received this message in error, you > should delete it from your system immediately and advise the sender. > To any recipient of this message within HP, unless otherwise stated you > should consider this message and attachments as "HP CONFIDENTIAL". > > > -----Original Message----- > From: Todd Lipcon [mailto:t...@cloudera.com] > Sent: 09 February 2010 01:11 > To: hdfs-dev@hadoop.apache.org > Subject: Re: Name Node Corruption When Shutdown Too Soon > > Hi Jonathan, > > Another question: how have you configured dfs.name.dir? Do you have > several directories configured? > > Thanks > -Todd > > On Mon, Feb 8, 2010 at 4:45 PM, Todd Lipcon <t...@cloudera.com> wrote: >> Hey Jonathan, >> >> As Konstantin mentioned, I've been looking into a couple issues that >> could be related. At first glance it doesn't sound like you've run >> into quite the same thing. >> >> What version did you see this on? The steps to reproduce are something like: >> >> 1) Start a NN >> 2) Perform a bunch of edits so there is a large edit log >> 3) kill -9 the NN >> 4) start the NN again >> 5) while it is in the middle of replaying edits, kill -9 it again >> 6) start the NN, and lose all the previous edits? >> >> Or did I misunderstand what happened? If that sounds right, I'll give >> it a go and see if I can reproduce. >> >> Thanks >> -Todd >> >> On Sun, Feb 7, 2010 at 8:45 AM, Allen, Jonathan <jonathan.all...@hp.com> >> wrote: >>> I've come across a name node bug and just wanted to check if it's a known >>> issue before I formally raise it (I've had a quick look through the >>> database but couldn't see anything obvious). >>> >>> If the name node is shut down before it has completed reading through the >>> edit log then the edit log gets removed without the image file being >>> updated. This results in name node reverting to its previously saved state >>> (out of sync with the data nodes) and the most recent edits getting lost. >>> >>> Does anybody recognise this as a known issue or should I raise it? >>> >>> Thanks, >>> Jonathan Allen >>> UKGP, NS&R, Defence and Security >>> HP Enterprise Services >>> Telephone +44 1682 292101 >>> Email jonathan.allen...@hp.com >>> Street address, Unit 29, Alexandra Way, Ashchurch Business Park, >>> Tewkesbury, Gloucestershire. GL20 8NB >>> >>> Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 >>> 1HN >>> Registered No: 690597 England >>> The contents of this message and any attachments to it are confidential and >>> may be legally privileged. If you have received this message in error, you >>> should delete it from your system immediately and advise the sender. >>> To any recipient of this message within HP, unless otherwise stated you >>> should consider this message and attachments as "HP CONFIDENTIAL". >>> >>> >>> >>> >> >