Hi All, Chris Nauroth / Arpit / Vinay and me discussing this calculation.
There is a disagreement on the definition of non-DFS used space, because of which Issue is not making progress. Essentially, it's a question of whether this metric means "Raw Non-DFS Used" or "Unplanned Non-DFS Used". Here is the summary of the conversation, by Arpit. The pre HDFS-5215 calculation had two bugs. 1. It incorrectly subtracted reserved space from the non-DFS used. (net negative). Chris suggests this is not really an issue as non-DFS used should be shown as zero unless it exceeds the DFS reserved value. 2. It used File#getUsableSpace to calculate the volume free space instead of File#getFreeSpace. (net positive) The net effect was that non-DFS used was displayed as zero unless the actual non-DFS used exceeded DFS reserved - system reserved. HDFS-5215 fixed the first issue and the value that is now erroneously counted towards non-DFS used is in fact the system reserved 5%. >From the testing it was found that, "Ext derivatives hold back 5% free space >while XFS does not." Proposed calculation to report the exact Non-DFS Usage: non-DFS used = getCapacity() + reserved - getDfsUsed() - totalFreeSpace = usage.getCapacity() - reserved + reserved - getDfsUsed() - totalFreeSpace = usage.getCapacity() - getDfsUsed() - totalFreeSpace = File#getTotalSpace - getDfsUsed() - File#getFreeSpace Chris Nauroth thinks we should subtract "dfs.datanode.du.reserved" for non-dfs used because it allowed to monitor for unexpected non-zero non-DFS usage and react. Even Akira given "+0" on above calculation. We would like take inputs from you to see some progress on the issue. Please let me know your thoughts on this issue. Thanks --Brahma Reddy Battula