This is Hadoop's unit test for sequence file exercising hsync/hflush: https://github.com/apache/hadoop/blob/61df1b27a797efd094328c7d9141b9e157e01bf4/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestHSync.java#L151
On Mon, Jan 27, 2025 at 9:53 AM Wei-Chiu Chuang <weic...@apache.org> wrote: > Do you have a unit test to reproduce? > > Note that for performance reasons, the actual HDFS hflush/hsync > implementation does not update visible length at NameNode. > The data is flushed to DataNode though. So the writer would need to record > the client side visible length, and pass it to the reader, if the reader > wishes to read to the latest visible length. > > We happen to implement hflush/hsync semantics in Ozone and therefore we're > quite familiar with it. > > > On Mon, Jan 27, 2025 at 6:30 AM Sébastien Rebecchi > <srebec...@kameleoon.com.invalid> wrote: > >> Hello, >> >> I got this issue using hadoop client with both hadoop 2.9.1 and 3.4.1 >> (client 2.9.1 to write to HDFS 2.9.1 etc). >> For SequenceFile.Writer class, after doing hflush(), data is not visible >> to >> new readers, it is visible only after doing close(). >> The doc asserts that data must be visible to new readers >> >> https://hadoop.apache.org/docs/r3.4.1/hadoop-project-dist/hadoop-common/filesystem/outputstream.html >> >> What should we do for this? >> >> Thanks, >> Sébastien. >> >