This is Hadoop's unit test for sequence file exercising hsync/hflush:

https://github.com/apache/hadoop/blob/61df1b27a797efd094328c7d9141b9e157e01bf4/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestHSync.java#L151

On Mon, Jan 27, 2025 at 9:53 AM Wei-Chiu Chuang <weic...@apache.org> wrote:

> Do you have a unit test to reproduce?
>
> Note that for performance reasons, the actual HDFS hflush/hsync
> implementation does not update visible length at NameNode.
> The data is flushed to DataNode though. So the writer would need to record
> the client side visible length, and pass it to the reader, if the reader
> wishes to read to the latest visible length.
>
> We happen to implement hflush/hsync semantics in Ozone and therefore we're
> quite familiar with it.
>
>
> On Mon, Jan 27, 2025 at 6:30 AM Sébastien Rebecchi
> <srebec...@kameleoon.com.invalid> wrote:
>
>> Hello,
>>
>> I got this issue using hadoop client with both hadoop 2.9.1 and 3.4.1
>> (client 2.9.1 to write to HDFS 2.9.1 etc).
>> For SequenceFile.Writer class, after doing hflush(), data is not visible
>> to
>> new readers, it is visible only after doing close().
>> The doc asserts that data must be visible to new readers
>>
>> https://hadoop.apache.org/docs/r3.4.1/hadoop-project-dist/hadoop-common/filesystem/outputstream.html
>>
>> What should we do for this?
>>
>> Thanks,
>> Sébastien.
>>
>

Reply via email to