Re: Data not visible to new readers after SequenceFile.writer.hflush()

2025-02-03 Thread Steve Loughran
it's visible, but the namenode isn't updated so file.len() is still the same. if you go to the EOF and then try to read past it, you get the new data. I know, it's not "posix", but, well, view as eventual consistency on file length other thing to know: close() does an hflush, but not an hsync().

Re: Data not visible to new readers after SequenceFile.writer.hflush()

2025-01-28 Thread Sébastien Rebecchi
Hello Thank you for answering. As I understood hflush must make the data visible to new readers but this is not the case actually for performance reasons. Can it be considered a bug? When you say 'the writer would need to record the client side visible length', I don't see anything to do that fr

Re: Data not visible to new readers after SequenceFile.writer.hflush()

2025-01-27 Thread Wei-Chiu Chuang
This is Hadoop's unit test for sequence file exercising hsync/hflush: https://github.com/apache/hadoop/blob/61df1b27a797efd094328c7d9141b9e157e01bf4/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestHSync.java#L151 On Mon, Jan 27, 2025 at 9:53 AM Wei-Chiu Ch

Re: Data not visible to new readers after SequenceFile.writer.hflush()

2025-01-27 Thread Wei-Chiu Chuang
Do you have a unit test to reproduce? Note that for performance reasons, the actual HDFS hflush/hsync implementation does not update visible length at NameNode. The data is flushed to DataNode though. So the writer would need to record the client side visible length, and pass it to the reader, if

Re: Data not visible to new readers after SequenceFile.writer.hflush()

2025-01-27 Thread Chris Nauroth
Hi Sébastien, I replied to your same question on the hadoop-user@ thread. Let's keep the discussion there unless we discover there is some kind of HDFS bug to discuss. Thank you! Chris Nauroth On Mon, Jan 27, 2025 at 6:31 AM Sébastien Rebecchi wrote: > Hello, > > I got this issue using hadoop

Data not visible to new readers after SequenceFile.writer.hflush()

2025-01-27 Thread Sébastien Rebecchi
Hello, I got this issue using hadoop client with both hadoop 2.9.1 and 3.4.1 (client 2.9.1 to write to HDFS 2.9.1 etc). For SequenceFile.Writer class, after doing hflush(), data is not visible to new readers, it is visible only after doing close(). The doc asserts that data must be visible to new