Hello Thank you for answering.
As I understood hflush must make the data visible to new readers but this is not the case actually for performance reasons. Can it be considered a bug? When you say 'the writer would need to record the client side visible length', I don't see anything to do that from SequenceFile.Writer class, hsync does not take any parameter. Is there a clean workaround you could recommend to me? Sébastien. Le lun. 27 janv. 2025 à 18:54, Wei-Chiu Chuang <weic...@apache.org> a écrit : > Do you have a unit test to reproduce? > > Note that for performance reasons, the actual HDFS hflush/hsync > implementation does not update visible length at NameNode. > The data is flushed to DataNode though. So the writer would need to record > the client side visible length, and pass it to the reader, if the reader > wishes to read to the latest visible length. > > We happen to implement hflush/hsync semantics in Ozone and therefore we're > quite familiar with it. > > > On Mon, Jan 27, 2025 at 6:30 AM Sébastien Rebecchi > <srebec...@kameleoon.com.invalid> wrote: > > > Hello, > > > > I got this issue using hadoop client with both hadoop 2.9.1 and 3.4.1 > > (client 2.9.1 to write to HDFS 2.9.1 etc). > > For SequenceFile.Writer class, after doing hflush(), data is not visible > to > > new readers, it is visible only after doing close(). > > The doc asserts that data must be visible to new readers > > > > > https://hadoop.apache.org/docs/r3.4.1/hadoop-project-dist/hadoop-common/filesystem/outputstream.html > > > > What should we do for this? > > > > Thanks, > > Sébastien. > > >