Hi All, The hadoop-hdfs-raid project is a wonderful scenario of decrease storage overhead and take no influence of reliable at the same time. At the present implementation, the cold data will be encoded following different rules: 1) The coldest data will be encoded by Reed Solomon erasure code; 2) The data which is not very cold will be encoded by XOR code. The encoding process is calculated by the blocks of the same file and these files that need to be encoding is configured by raid.xml. It's an annoying work if there are huge numbers of these files and the hotness of a file may be changed as time. I wonder whether there is a way to determinate the hotness of a certain block automatically and encoding the blocks with the same hotness level without distinguishing the concept of file. Then I considered to implement this feature at hadoop-hdfs-raid project: 1) It does not need the administrator to configure the raid.xml file to determinate which files should be encoding and the encoding scenarios. The block selection process is finished automatically. 2) There is no concepts of file during the calculation of erasure code. It means that the calculation of erasure code will not take a file for unit. The calculation of erasure code will follow the hotness of the block absolutely. And the blocks with the same hotness will be encoding at the same time.
Pls give some comments and opinions about this plan. How to start a new feature and contribute a patch in the open source hadoop project.