Hi All,

The hadoop-hdfs-raid project is a wonderful scenario of decrease storage
overhead and take no influence of reliable at the same time.
At the present implementation, the cold data will be encoded following
different rules:
1) The coldest data will be encoded by Reed Solomon erasure code;
2) The data which is not very cold will be encoded by XOR code.
The encoding process is calculated by the blocks of the same file and these
files that need to be encoding is configured by raid.xml. It's an annoying
work if there are huge numbers of these files and the hotness of a file may
be changed as time.
I wonder whether there is a way to determinate the hotness of a certain
block automatically and encoding the blocks with the same hotness level
without distinguishing the concept of file.
Then I considered to implement this feature at hadoop-hdfs-raid project:
1) It does not need the administrator to configure the raid.xml file to
determinate which files should be encoding and the encoding scenarios. The
block selection process is finished automatically.
2) There is no concepts of file during the calculation of erasure code. It
means that the calculation of erasure code will not take a file for unit.
The calculation of erasure code will follow the hotness of the block
absolutely. And the blocks with the same hotness will be encoding at the
same time.

Pls give some comments and opinions about this plan.
How to start a new feature and contribute a patch in the open source hadoop
project.

Reply via email to