Greetings! We are evaluating erasure coding on HDFS to reduce storage cost. However, the degraded read latency seems like a crucial bottleneck for our system. After exploring some strategies for alleviating the pain of degraded read latency, I found a "tree-like recovery" technique might be useful, as described in the following paper: "Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage" (Eurosys-2016) http://dl.acm.org/citation.cfm?id=2901328
My question is: Do you already have such tree-like recovery implemented in HDFS-EC if not, do you have any plans to add similar technique is near future ? Also, I would like to know what others have done to sustain good performance even under failures (other than keeping fail-over replicas). Regards, R.