Hi Christoph, If you use sync/hflush/hsync, the new length of data is only seen by a new reader, not an existent reader. The "workaround" you've done exactly how we've implemented the "fs -tail <file>" utility. See code for that at http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Tail.java?view=markup (Note the looping at ~74).
On Thu, Dec 20, 2012 at 5:51 PM, Christoph Rupp <ch...@crupp.de> wrote: > Hi, > > I am experiencing an unexpected situation where FSDataInputStream.read() > returns -1 while reading data from a file that another process still appends > to. According to the documentation read() should never return -1 but throw > Exceptions on errors. In addition, there's more data available, and read() > definitely should not fail. > > The problem gets worse because the FSDataInputStream is not able to recover > from this. If it once returns -1 then it will always return -1, even if the > file continues growing. > > If, at the same time, other Java processes read other HDFS files, they will > also return -1 immediately after opening the file. It smells like this error > gets propagated to other client processes as well. > > I found a workaround: close the FSDataInputStream, open it again and then > seek to the previous position. And then reading works fine. > > Another problem that i have seen is that the FSDataInputStream returns -1 > when reaching EOF. It will never return 0 (which i would expect when > reaching EOF). > > I use CDH 4.1.2, but also saw this with CDH 3u5. I have attached samples to > reproduce this. > > My cluster consists of 4 machines; 1 namenode and 3 datanodes. I run my > tests on the namenode machine. there are no other HDFS users, and the load > that is generated by my tests is fairly low, i would say. > > One process writes to 6 files simultaneously, but with a 5 sec sleep between > each write. It uses an FSDataOutputStream, and after writing data it calls > sync(). Each write() appends 8 mb; it stops when the file grows to 100 mb. > > Six processes read files; each process reads one file. At first each reader > loops till the file exists. If it does then it opens the FSDataInputStream > and starts reading. Usually the first process returns the first 8 MB in the > file before it starts returning -1. But the other processes immediately > return -1 without reading any data. I start the 6 reader processes before i > start the writer. > > Search HdfsReader.java for "WORKAROUND" and remove the comments; this will > reopen the FSDataInputStream after -1 is returned, and then everything > works. > > Sources are attached. > > This is a very basic scenario and i wonder if i'm doing anything wrong or if > i found an HDFS bug. > > bye > Christoph > -- Harsh J