Hi, I am experiencing an unexpected situation where FSDataInputStream.read() returns -1 while reading data from a file that another process still appends to. According to the documentation read() should never return -1 but throw Exceptions on errors. In addition, there's more data available, and read() definitely should not fail.
The problem gets worse because the FSDataInputStream is not able to recover from this. If it once returns -1 then it will always return -1, even if the file continues growing. If, at the same time, other Java processes read other HDFS files, they will also return -1 immediately after opening the file. It smells like this error gets propagated to other client processes as well. I found a workaround: close the FSDataInputStream, open it again and then seek to the previous position. And then reading works fine. Another problem that i have seen is that the FSDataInputStream returns -1 when reaching EOF. It will never return 0 (which i would expect when reaching EOF). I use CDH 4.1.2, but also saw this with CDH 3u5. I have attached samples to reproduce this. My cluster consists of 4 machines; 1 namenode and 3 datanodes. I run my tests on the namenode machine. there are no other HDFS users, and the load that is generated by my tests is fairly low, i would say. One process writes to 6 files simultaneously, but with a 5 sec sleep between each write. It uses an FSDataOutputStream, and after writing data it calls sync(). Each write() appends 8 mb; it stops when the file grows to 100 mb. Six processes read files; each process reads one file. At first each reader loops till the file exists. If it does then it opens the FSDataInputStream and starts reading. Usually the first process returns the first 8 MB in the file before it starts returning -1. But the other processes immediately return -1 without reading any data. I start the 6 reader processes before i start the writer. Search HdfsReader.java for "WORKAROUND" and remove the comments; this will reopen the FSDataInputStream after -1 is returned, and then everything works. Sources are attached. This is a very basic scenario and i wonder if i'm doing anything wrong or if i found an HDFS bug. bye Christoph