Fenghua Hu created HDFS-10690:
---------------------------------
Summary: Optimize insertion/removal of replica in
ShortCircuitCache.java
Key: HDFS-10690
URL: https://issues.apache.org/jira/browse/HDFS-10690
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client
Reporter: Fenghua Hu
Currently in ShortCircuitCache, two TreeMap objects are used to track the
cached replicas.
private final TreeMap<Long, ShortCircuitReplica> evictable = new TreeMap<>();
private final TreeMap<Long, ShortCircuitReplica> evictableMmapped = new
TreeMap<>();
TreeMap employs Red-Black tree for sorting. This isn't an issue when using
traditional HDD. But when using high-performance SSD/PCIe Flash, the cost
inserting/removing an entry becomes considerable.
To mitigate it, we designed a new list-based for replica tracking.
The list is a double-linked FIFO. FIFO is time-based, thus insertion is a very
low cost operation. On the other hand, list is not lookup-friendly. To address
this issue, we introduce two references into ShortCircuitReplica object.
ShortCircuitReplica next = null;
ShortCircuitReplica prev = null;
In this way, lookup is not needed when removing a replica from the list. We
only need to modify its predecessor's and successor's references in the lists.
Our tests showed up to 15-50% performance improvement when using PCIe flash as
storage media.
The original patch is against 2.6.4, now I am porting to Hadoop trunk, and
patch will be posted soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]