Hello

Thank you for answering.

As I understood hflush must make the data visible to new readers but this
is not the case actually for performance reasons. Can it be considered a
bug?

When you say 'the writer would need to record the client side visible
length', I don't see anything to do that from SequenceFile.Writer class,
hsync does not take any parameter.
Is there a clean workaround you could recommend to me?

Sébastien.

Le lun. 27 janv. 2025 à 18:54, Wei-Chiu Chuang <weic...@apache.org> a
écrit :

> Do you have a unit test to reproduce?
>
> Note that for performance reasons, the actual HDFS hflush/hsync
> implementation does not update visible length at NameNode.
> The data is flushed to DataNode though. So the writer would need to record
> the client side visible length, and pass it to the reader, if the reader
> wishes to read to the latest visible length.
>
> We happen to implement hflush/hsync semantics in Ozone and therefore we're
> quite familiar with it.
>
>
> On Mon, Jan 27, 2025 at 6:30 AM Sébastien Rebecchi
> <srebec...@kameleoon.com.invalid> wrote:
>
> > Hello,
> >
> > I got this issue using hadoop client with both hadoop 2.9.1 and 3.4.1
> > (client 2.9.1 to write to HDFS 2.9.1 etc).
> > For SequenceFile.Writer class, after doing hflush(), data is not visible
> to
> > new readers, it is visible only after doing close().
> > The doc asserts that data must be visible to new readers
> >
> >
> https://hadoop.apache.org/docs/r3.4.1/hadoop-project-dist/hadoop-common/filesystem/outputstream.html
> >
> > What should we do for this?
> >
> > Thanks,
> > Sébastien.
> >
>

Reply via email to