Hi,

I am experiencing an unexpected situation where FSDataInputStream.read()
returns -1 while reading data from a file that another process still
appends to. According to the documentation read() should never return -1
but throw Exceptions on errors. In addition, there's more data available,
and read() definitely should not fail.

The problem gets worse because the FSDataInputStream is not able to recover
from this. If it once returns -1 then it will always return -1, even if the
file continues growing.

If, at the same time, other Java processes read other HDFS files, they will
also return -1 immediately after opening the file. It smells like this
error gets propagated to other client processes as well.

I found a workaround: close the FSDataInputStream, open it again and then
seek to the previous position. And then reading works fine.

Another problem that i have seen is that the FSDataInputStream returns -1
when reaching EOF. It will never return 0 (which i would expect when
reaching EOF).

I use CDH 4.1.2, but also saw this with CDH 3u5. I have attached samples to
reproduce this.

My cluster consists of 4 machines; 1 namenode and 3 datanodes. I run my
tests on the namenode machine. there are no other HDFS users, and the load
that is generated by my tests is fairly low, i would say.

One process writes to 6 files simultaneously, but with a 5 sec sleep
between each write. It uses an FSDataOutputStream, and after writing data
it calls sync(). Each write() appends 8 mb; it stops when the file grows to
100 mb.

Six processes read files; each process reads one file. At first each reader
loops till the file exists. If it does then it opens the FSDataInputStream
and starts reading. Usually the first process returns the first 8 MB in the
file before it starts returning -1. But the other processes immediately
return -1 without reading any data. I start the 6 reader processes before i
start the writer.

Search HdfsReader.java for "WORKAROUND" and remove the comments; this will
reopen the FSDataInputStream after -1 is returned, and then everything
works.

Sources are attached.

This is a very basic scenario and i wonder if i'm doing anything wrong or
if i found an HDFS bug.

bye
Christoph

Reply via email to