John Thiltges created HDFS-13294:
------------------------------------

             Summary: Flushing writes to disk with libhdfs
                 Key: HDFS-13294
                 URL: https://issues.apache.org/jira/browse/HDFS-13294
             Project: Hadoop HDFS
          Issue Type: Wish
          Components: libhdfs
            Reporter: John Thiltges


I'm working with an FTP server that writes into HDFS using libhdfs. I'd like to 
ensure that incoming files are persisted on datanode disks before returning 
success to clients. At present, power failures often mean lost blocks for 
recent uploads.

The hsync() call and CreateFlag.SYNC_BLOCK open flags seem like the right 
direction, but there doesn't appear to be a way to set SYNC_BLOCK with the 
libhdfs interface. I believe hsync() only applies to the current block for a 
filehandle.

Thoughts on implementing it:
 # Use an existing 'close enough' fcntl flag to set SYNC_BLOCK?
    Maybe O_DIRECT? Or O_SYNC or O_DSYNC
    This would probably be the best, as it would keep the libhdfs interface the 
same, and older versions would ignore the flags.
 # Make hdfsOpenFile2 and have it accept HDFS flags (instead of fcntl flags)?
 # Provide a method in DFSOutputStream to set shouldSyncBlock on an existing 
stream, and a function in libhdfs to enable it?

For flushing writes with libhdfs right now (using CDH5), I'm guessing my only 
option is to call hsync() after every 'block size' of writes, exactly on the 
boundary.

Best regards,
John



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to