it's visible, but the namenode isn't updated so file.len() is still the
same.
if you go to the EOF and then try to read past it, you get the new data.
I know, it's not "posix", but, well, view as eventual consistency on file
length
other thing to know: close() does an hflush, but not an hsync().
Hello
Thank you for answering.
As I understood hflush must make the data visible to new readers but this
is not the case actually for performance reasons. Can it be considered a
bug?
When you say 'the writer would need to record the client side visible
length', I don't see anything to do that fr
This is Hadoop's unit test for sequence file exercising hsync/hflush:
https://github.com/apache/hadoop/blob/61df1b27a797efd094328c7d9141b9e157e01bf4/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestHSync.java#L151
On Mon, Jan 27, 2025 at 9:53 AM Wei-Chiu Ch
Do you have a unit test to reproduce?
Note that for performance reasons, the actual HDFS hflush/hsync
implementation does not update visible length at NameNode.
The data is flushed to DataNode though. So the writer would need to record
the client side visible length, and pass it to the reader, if
Hi Sébastien,
I replied to your same question on the hadoop-user@ thread. Let's keep the
discussion there unless we discover there is some kind of HDFS bug to
discuss. Thank you!
Chris Nauroth
On Mon, Jan 27, 2025 at 6:31 AM Sébastien Rebecchi
wrote:
> Hello,
>
> I got this issue using hadoop
Hello,
I got this issue using hadoop client with both hadoop 2.9.1 and 3.4.1
(client 2.9.1 to write to HDFS 2.9.1 etc).
For SequenceFile.Writer class, after doing hflush(), data is not visible to
new readers, it is visible only after doing close().
The doc asserts that data must be visible to new