Re: When directly writing to HDFS, the data is moved only on file close

Stas Oskin Tue, 26 May 2009 11:38:45 -0700

Hi.

You probably referring to the following paragraph?


After some back and forth over a set of slides presented by Sanjay on
work being done by Hairong as part of HADOOP-5744, "Revising append",
the room settled on API3 from the list of options below as the
priority feature needed by HADOOP 0.21.0.  Readers must be able to
read up to the last writer 'successful' flush.  Its not important that
the file length is 'inexact'.

If I'm understand correctly, this, means the data actually gets written to
cluster - but it's not visible until the block is closed.
Work is ongoing to allow in version 0.21 to make the file visible on
flush().

Am I correct up to here?

Regards.


2009/5/26 Tom White <t...@cloudera.com>
>
> This feature is not available yet, and is still under active
>> discussion. (The current version of HDFS will make the previous block
>> available to readers.) Michael Stack gave a good summary on the HBase
>> dev list:
>>
>>
>> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-dev/200905.mbox/%3c7c962aed0905231601g533088ebj4a7a068505ba3...@mail.gmail.com%3e
>>
>> Tom
>>
>> On Tue, May 26, 2009 at 12:08 PM, Stas Oskin <stas.os...@gmail.com>
>> wrote:
>> > Hi.
>> >
>> > I'm trying to continuously write data to HDFS via OutputStream(), and
>> want
>> > to be able to read it at the same time from another client.
>> >
>> > Problem is, that after the file is created on HDFS with size of 0, it
>> stays
>> > that way, and only fills up when I close the OutputStream().
>> >
>> > Here is a simple code sample illustrating this issue:
>> >
>> > try {
>> >
>> >            FSDataOutputStream out=fileSystem.create(new
>> > Path("/test/test.bin")); // Here the file created with 0 size
>> >            for(int i=0;i<1000;i++)
>> >            {
>> >                out.write(1); // Still stays 0
>> >                out.flush(); // Even when I flush it out???
>> >            }
>> >
>> >            Thread.currentThread().sleep(10000);
>> >            out.close(); //Only here the file is updated
>> >        } catch (Exception e) {
>> >            e.printStackTrace();
>> >        }
>> >
>> > So, two questions here:
>> >
>> > 1) How it's possible to write the files directly to HDFS, and have them
>> > update there immedaitely?
>> > 2) Just for information, in this case, where the file content stays all
>> the
>> > time - on server local disk, in memory, etc...?
>> >
>> > Thanks in advance.
>> >
>>
>

Re: When directly writing to HDFS, the data is moved only on file close

Reply via email to