Add more useful metrics for write latency
-----------------------------------------

                 Key: HDFS-3170
                 URL: https://issues.apache.org/jira/browse/HDFS-3170
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: data-node
    Affects Versions: 2.0.0
            Reporter: Todd Lipcon


Currently, the only write-latency related metric we expose is the total amount 
of time taken by opWriteBlock. This is practically useless, since (a) different 
blocks may be wildly different sizes, and (b) if the writer is only generating 
data slowly, it will make a block write take longer by no fault of the DN. I 
would like to propose two new metrics:
1) *flush-to-disk time*: count how long it takes for each call to flush an 
incoming packet to disk (including the checksums). In most cases this will be 
close to 0, as it only flushes to buffer cache, but if the backing block device 
enters congested writeback, it can take much longer, which provides an 
interesting metric.
2) *round trip to downstream pipeline node*: track the round trip latency for 
the part of the pipeline between the local node and its downstream neighbors. 
When we add a new packet to the ack queue, save the current timestamp. When we 
receive an ack, update the metric based on how long since we sent the original 
packet. This gives a metric of the total RTT through the pipeline. If we also 
include this metric in the ack to upstream, we can subtract the amount of time 
due to the later stages in the pipeline and have an accurate count of this 
particular link.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to