Hi,
I'm interested in adding read-time (from HDFS) to Task Metrics. The
motivation is to help debug performance issues. After some digging, its
briefly mentioned in SPARK-1683 that this feature didn't make it due to
metric collection causing a performance regression [1].
I'd like to try tackling
alls to read() take a long time (the ones that
> cause a larger block to be read from disk).
>
> -Kay
>
>
> On Wed, May 11, 2016 at 2:01 PM, Reynold Xin wrote:
>
>> Adding Kay
>>
>>
>> On Wed, May 11, 2016 at 12:01 PM, Brian Cho wrote:
>>
>>
getting the metrics in?
Thanks,
Brian
On Thu, May 12, 2016 at 12:12 PM, Steve Loughran
wrote:
>
> On 12 May 2016, at 04:44, Brian Cho wrote:
>
> Hi Kay,
>
> Thank you for the detailed explanation.
>
> If I understand correctly, I *could* time each record processing time