it's visible, but the namenode isn't updated so file.len() is still the
same.

if you go to the EOF and then try to read past it, you get the new data.

I know, it's not "posix", but, well, view as eventual consistency on file
length

other thing to know: close() does an hflush, but not an hsync()...you want
guaranteed persistence, call hsync()


On Tue, 28 Jan 2025 at 09:03, Sébastien Rebecchi
<srebec...@kameleoon.com.invalid> wrote:

> Hello
>
> Thank you for answering.
>
> As I understood hflush must make the data visible to new readers but this
> is not the case actually for performance reasons. Can it be considered a
> bug?
>
> When you say 'the writer would need to record the client side visible
> length', I don't see anything to do that from SequenceFile.Writer class,
> hsync does not take any parameter.
> Is there a clean workaround you could recommend to me?
>
> Sébastien.
>
> Le lun. 27 janv. 2025 à 18:54, Wei-Chiu Chuang <weic...@apache.org> a
> écrit :
>
> > Do you have a unit test to reproduce?
> >
> > Note that for performance reasons, the actual HDFS hflush/hsync
> > implementation does not update visible length at NameNode.
> > The data is flushed to DataNode though. So the writer would need to
> record
> > the client side visible length, and pass it to the reader, if the reader
> > wishes to read to the latest visible length.
> >
> > We happen to implement hflush/hsync semantics in Ozone and therefore
> we're
> > quite familiar with it.
> >
> >
> > On Mon, Jan 27, 2025 at 6:30 AM Sébastien Rebecchi
> > <srebec...@kameleoon.com.invalid> wrote:
> >
> > > Hello,
> > >
> > > I got this issue using hadoop client with both hadoop 2.9.1 and 3.4.1
> > > (client 2.9.1 to write to HDFS 2.9.1 etc).
> > > For SequenceFile.Writer class, after doing hflush(), data is not
> visible
> > to
> > > new readers, it is visible only after doing close().
> > > The doc asserts that data must be visible to new readers
> > >
> > >
> >
> https://hadoop.apache.org/docs/r3.4.1/hadoop-project-dist/hadoop-common/filesystem/outputstream.html
> > >
> > > What should we do for this?
> > >
> > > Thanks,
> > > Sébastien.
> > >
> >
>

Reply via email to