Adding HDFS read-time metrics per task (RE: SPARK-1683)

Brian Cho Wed, 11 May 2016 12:01:58 -0700

Hi,

I'm interested in adding read-time (from HDFS) to Task Metrics. The
motivation is to help debug performance issues. After some digging, its
briefly mentioned in SPARK-1683 that this feature didn't make it due to
metric collection causing a performance regression [1].


I'd like to try tackling this, but would be very grateful if those with
experience can give some more information on what was attempted previously,
and why this didn't work previously. Or if there are philosophical
objections to these metrics. If you feel this is a dead-end please help me
from myself.

Thank you,
Brian

[1] https://github.com/apache/spark/pull/962

Adding HDFS read-time metrics per task (RE: SPARK-1683)

Reply via email to