Adding HDFS read-time metrics per task (RE: SPARK-1683)

2016-05-11 Thread Brian Cho
Hi, I'm interested in adding read-time (from HDFS) to Task Metrics. The motivation is to help debug performance issues. After some digging, its briefly mentioned in SPARK-1683 that this feature didn't make it due to metric collection causing a performance regression [1]. I'd like to try tackling

Re: Adding HDFS read-time metrics per task (RE: SPARK-1683)

2016-05-11 Thread Brian Cho
alls to read() take a long time (the ones that > cause a larger block to be read from disk). > > -Kay > > > On Wed, May 11, 2016 at 2:01 PM, Reynold Xin wrote: > >> Adding Kay >> >> >> On Wed, May 11, 2016 at 12:01 PM, Brian Cho wrote: >> >>

Re: Adding HDFS read-time metrics per task (RE: SPARK-1683)

2016-05-12 Thread Brian Cho
getting the metrics in? Thanks, Brian On Thu, May 12, 2016 at 12:12 PM, Steve Loughran wrote: > > On 12 May 2016, at 04:44, Brian Cho wrote: > > Hi Kay, > > Thank you for the detailed explanation. > > If I understand correctly, I *could* time each record processing time