Re: Problem reading HDFS block size > 1GB

Vinay Setty Fri, 02 Oct 2009 08:35:40 -0700

Hi Raghu,
Thank you for your reply, I have attached everything including logs in jira.
If you need any more information we can continue discussion in jira.


Cheers,
Vinay

On Fri, Oct 2, 2009 at 12:48 AM, Raghu Angadi <rang...@apache.org> wrote:

> Vinay,
>
> This issue came up before (
> http://www.mail-archive.com/core-...@hadoop.apache.org/msg35620.html ) . I
> think we should fix this soon. Dhruba filed a jira (
> https://issues.apache.org/jira/browse/HDFS-96 ) .  Not all errors reported
> here fixed by the patch attached there. Could we discuss this on the jira.
> For now you could try the path there in addition to your fix.
>
> I think supporting large block size requires a fixes like the one for
> FSInputChecker below, BlockSender fix in HDFS-96. I think it would not
> require any major structural changes. Please attach you findings to the
> jira. I can help.
>
> Regd read timeout from DN, could you check log on the datanode as well?
>
> Raghu.
>
> On Tue, Sep 29, 2009 at 10:52 AM, Vinay Setty <vinay.se...@gmail.com>
> wrote:
>
> > Hi all,
> > We are running Yahoo distribution of Hadoop based on Hadoop
> 0.20.0-2787265
> > .
> > On a 10 nodes cluster with OpenSUSE Linux Operating System. We have HDFS
> > configured with Block Size 5GB (This is for our experiments). But we are
> > facing following problems when we try reading the data beyond 1GB from
> 5GB
> > input split.
> >
> > *1) Problem in DFSClient*
> >  When we read the text data, the map reduce job was getting stuck after
> > reading about first 1GB of data from the split. Thereafter it was unable
> to
> > read anymore data using IOUtils.readFully() and after 10 such minutes
> > Hadoop
> > got timed out. Then we tried to manually skip first 2GB from the 5GB
> split
> > and start reading from 2GB offset inside the split. There skipping was ok
> > but reading the first line after skipping 2GB gave the following
> exception:
> >
> > java.lang.IndexOutOfBoundsException
> > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:151)
> > at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1118)
> >  at
> >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1666)
> > at
> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1716)
> >  at java.io.DataInputStream.read(DataInputStream.java:132)
> > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
> >  at
> >
> >
> edu.unisb.cs.mapreduce.binary.load.LoaderRecordReader.<init>(LoaderRecordReader.java:101)
> > at
> >
> >
> edu.unisb.cs.mapreduce.binary.load.LoaderInputFormat.getRecordReader(LoaderInputFormat.java:45)
> >  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:337)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:306)
> >  at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> > This was particularly strange. We looked at line number 151 of
> > FSInputChecker where the exception was thrown and it looks like below:
> >
> > public synchronized int read(byte[] b, int off, int len) throws
> IOException
> > {
> >    if ((off | len | (off + len) | (b.length - (off + len))) < 0) {
> >      throw new IndexOutOfBoundsException();
> >    }
> >
> > It looks like parameters "off" or "len" got negative. We traced the stack
> > back and found following interesting thing at lines 1715 and 1716 in
> > DFSClient:
> >
> >            int realLen = Math.min(len, (int) (blockEnd - pos + 1));
> >            int result = readBuffer(buf, off, realLen);
> >
> > The first line takes int of (blockEnd - pos +1) where blockEnd is
> absolute
> > end position of the block (split) in the file and pos is the current
> > position, both as long. But after taking their integer value it might
> > overflow! and make realLen negative, thereby giving
> > IndexOutOfBoudsException.
> >
> > This problem we fixed by modifying  DFSClient.java:1715 to
> >
> > int realLen = (int)Math.min(len,  (blockEnd - pos + 1));
> >
> > *2) Problem with  IOUtils.skipFully() *
> >
> > We fixed the problem of casting from long to int, but we got a new
> problem,
> > i.e. we are not able to read more than 1GB of data. When we
> > did  IOUtils.skipFully() and skipped 1GB and treid to read from there, we
> > following exception.
> >
> > java.io.EOFException
> >        at java.io.DataInputStream.readInt(DataInputStream.java:375)
> >        at
> >
> org.apache.hadoop.hdfs.DFSClient$BlockReader.readChunk(DFSClient.java:1218)
> >        at
> >
> >
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)
> >        at
> org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176)
> >        at
> > org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193)
> >        at
> org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
> >        at
> > org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1117)
> >        at
> >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1665)
> >        at
> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1720)
> >        at java.io.DataInputStream.read(DataInputStream.java:132)
> >        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
> >        at
> >
> >
> edu.unisb.cs.mapreduce.binary.load.LoaderRecordReader.<init>(LoaderRecordReader.java:107)
> >        at
> >
> >
> edu.unisb.cs.mapreduce.binary.load.LoaderInputFormat.getRecordReader(LoaderInputFormat.java:45)
> >        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:336)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> > 3) *java.net.SocketTimeoutException*
> >
> > Since there seemed to be some problem with IOUtils.skipFully() we started
> > seeking just before reading each record. We do not get End of File
> > exception
> > anymore but get the following new exception after reading approx
> 1810517942
> > bytes (around 1.5GB).
> >
> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected
> > local=/134.96.223.140:48255remote=/134.96.223.140:50010]
> >  at
> >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> >  at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> >  at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> > at java.io.DataInputStream.readInt(DataInputStream.java:370)
> >  at
> >
> org.apache.hadoop.hdfs.DFSClient$BlockReader.readChunk(DFSClient.java:1218)
> > at
> >
> >
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)
> >  at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176)
> > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193)
> >  at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
> > at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1117)
> >  at
> >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1665)
> > at
> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1720)
> >  at java.io.DataInputStream.read(DataInputStream.java:132)
> > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
> >  at
> >
> >
> edu.unisb.cs.mapreduce.binary.load.LoaderRecordReader.next(LoaderRecordReader.java:163)
> > at
> >
> >
> edu.unisb.cs.mapreduce.binary.load.LoaderRecordReader.next(LoaderRecordReader.java:1)
> >  at
> >
> >
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> > at
> >
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> >  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> >  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> > Steps to reproduce:
> >
> >   1. Configure HDFS with 5GB block size
> >   2. Load text data of > 5GB to HDFS
> >   3. Run Grep or wordcount on the data
> >
> > We have following questions:
> >
> >
> >   1. Does any body know what is the maximum block size and split size
> >   supported by HDFS?
> >   2. Did any of you guys try something similar?
> >   3. Did any of you guys get similar problems? If so can you please tell
> us
> >   how you resolved it?
> >
> > Thank you,
> > Vinay Setty
> >
>

Re: Problem reading HDFS block size > 1GB

Reply via email to