Re: Issue of FSImage, need help

mac fang Tue, 28 Jun 2011 18:12:24 -0700

HI, Todd,

we use the 0.21 version. I think we used the 'kill -9'. The possible timing
is when startup or checkpoint.


regards
macf

On Tue, Jun 28, 2011 at 11:03 PM, Todd Lipcon <t...@cloudera.com> wrote:

> Hi Denny,
>
> Which version of Hadoop are you using, and when are you killing the
> NameNode? Are you using a unix signal (eg kill -9) or killing power to the
> whole machine?
>
> Thanks
> -Todd
>
> On Tue, Jun 28, 2011 at 2:11 AM, Denny Ye <denny...@gmail.com> wrote:
>
> > *Root cause*: Wrong FSImage format when user killed hdfs process. It may
> > read invalid block
> > number, may be 1 billion or more, OutOfMemoryError happens before
> > EOFException.
> >
> > How can we provide the validity of FSImage file?
> >
> > --regards
> > Denny Ye
> >
> > On Tue, Jun 28, 2011 at 4:44 PM, mac fang <mac.had...@gmail.com> wrote:
> >
> > > Hi, Team,
> > >
> > > What we found when we use the Hadoop is, the FSImage often currupts
> when
> > we
> > > do start/stop the Hadoop cluster. The reason we think might be around
> the
> > > write to the outputstream: the NameNode may be killed when it
> > > saveNamespace,
> > > then the FsImage file doesn't complete writing. Currently i saw a
> > > previous.checkpoint folder, the logic of saveNamespace is like:
> > >
> > > 1. mv the current folder to the previous.checkpoint folder.
> > > 2. start to write the FSImage into the current folder.
> > >
> > > I think there mightbe a case if the FSImage is currupted, the NameNode
> > can
> > > NOT be started, but we can NOT get any EOFException, since we might
> > > encounter the OutofMemory exception if we read the wrong numBlocks and
> > > instantiate the Blocks [] blocks = new Blocks[numBlocks] (actually, we
> > face
> > > this issue).
> > >
> > > Any suggestion to it?
> > >
> > > thanks
> > > macf
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Issue of FSImage, need help

Reply via email to