Hi, Can I get some reviews of this PR https://github.com/apache/hadoop/pull/2323
It adds a new API, IOStatisticsSource, for any class to act as a source of a static or dynamic IOStatistics set of counters/gauges/min/max/mean stats The intent is to allow applications to collect statistics on streams, iterators, and other classes they use to interact with filesystems/remote stores, so get detailed statistics on the #of operations, latencies etc. There's help to log these results, as well as aggregate them Here's the API specifications https://github.com/steveloughran/hadoop/blob/s3/HADOOP-16830-iostatistics-common/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/iostatistics.md The FSDataStreams do passthrough of this, and there's a set of remote iterators which also do passthrough, making it easy to chain/wrap iteration code. https://github.com/steveloughran/hadoop/blob/s3/HADOOP-16830-iostatistics-common/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/functional/RemoteIterators.java It also includes a statistics snapshot which can be serialized as JSON and java objects, and aggregate results https://github.com/steveloughran/hadoop/blob/s3/HADOOP-16830-iostatistics-common/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/IOStatisticsSnapshot.java This is how applications can aggregate results, and then propagate it back to the AM/job driver/query engine We already have PRs using this for S3A and ABFS on input streams, and in S3A we also count LIST performance, which clients can pick up provided they use the listStatusIterator, listFiles etc calls which return RemoteIterator. I know it's a lot of code, but it's split into interface and implementation, the public interface is for applications, the implementation is what we are using internally, and which we will tune as we adopt it more. I have been working on this on and off for months, and yes it has grown. But now that we are supporting more complex storage systems, the existing tracking of long/short reads isn't informative enough. I want to know how many GET requests failed and had to be retried, how often the DELETE calls were throttled, and what the real latency of list operations are over long-haul connections. Please, take a look. As a new API it's unlikely to cause any regressions -the main things to worry about are "is that API the one applications can use" and "hi Steve got something fundamentally wrong in his implementation code?" -Steve