Yang Yun created HDFS-15829:
-------------------------------
Summary: Use xattr to support HDFS TTL on Observer namenode
Key: HDFS-15829
URL: https://issues.apache.org/jira/browse/HDFS-15829
Project: Hadoop HDFS
Issue Type: Improvement
Components: dfsclient, namenode
Reporter: Yang Yun
Assignee: Yang Yun
h3. Overview
HDFS TTL is implemented using the xattr mechanism provided by HDFS. When a user
sets a TTL to a file or directory, HDFS creates an xattr named "ttl" for the
file or directory, and stores the value set by the user in this xattr. A
service called TtlService runs on HDFS standby or Observer(Recommended ). It
scans the in-memony inode map regularly, reads the value of xattr "ttl" from
each INode, and calculates whether the ttl has expired. If so, it will get the
full file path from Inode and add it to expired file list. After scan it will
create a DFSClient and delete the expired file list in bach. other option is to
trigger a Yarn job to delete them in parallel。
h3. Protocol
Add two xattr
"user.ttl": value of TTL by minutes, identify the time that file or folder
will be expired.
"user. ttlproperty": value is TTL types, including, * SINCELASTWRITE = 0x1 #
caculate the TTL from last writing.
* KEEPEMPTYDIR = 0x2; # if keep the empty dir
* KEEPEMPTYSUBDIR = 0x4; # fi keep subdir empty.
*Nested TTL*
TTL supports setting for each directory and file on a path, so that after
setting, the setting of the lower-level subdirectory or file will take effect.
If a directory or file does not have a time to live, it will inherit the
settings of the nearest ancestor directory. The following is an illustrative
example. Suppose there is such a directory tree:
{code:java}
/A/B/E
/A/C
/A/D {code}
That is, B, C and D under directory A. And there is file E under directory B.
Suppose the user sets the TTL of A to 2 days, the TTL of B to 3 days, the TTL
of E to 1 day, and the TTL of C and D is not set. Then the file E will be
cleared after 1 day. After 2 days, C and D will be cleared. The settings
inherited from directory A are used here. Please note that at this time,
directory A will not be cleared because it is not empty. After 3 days, B will
be cleared because its own settings expire. After B is cleared, because A’s
settings have already expired and A has become an empty directory, it will also
be cleared.
h3. Usage
Fro the first version, provide API to set the TTL, will add comand line later.
{code:java}
/**
* Set TTL to a file.
* @param fs the file system.
* @param path the target file to set TTL.
* @param path the TTL value.
* @param property the type of TTL.
* @throws IOException
*/
public static void setTTl(FileSystem fs, Path path, int value, int
property){code}
h3. Example
{code:java}
TtlInfo.setTTl(fs, file, System.currentTimeMillis() / 1000 / 60 + 60, 0); #The
file will be expired in an 60 minutes.
TtlInfo.setTTl(fs, file, 60, TtlInfo.SINCELASTWRITE); #The file will be expired
after 60 minutes since last write.{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]