network performance instrumentation

Mridul Muralidharan Wed, 09 Jul 2014 14:20:57 -0700

+1 on advanced mode !

Regards.
Mridul


On Thu, Jul 10, 2014 at 12:55 AM, Reynold Xin <r...@databricks.com> wrote:
> Maybe it's time to create an advanced mode in the ui.
>
>
> On Wed, Jul 9, 2014 at 12:23 PM, Kay Ousterhout <k...@eecs.berkeley.edu>
> wrote:
>
>> Hi all,
>>
>> I've been doing a bunch of performance measurement of Spark and, as part of
>> doing this, added metrics that record the average CPU utilization, disk
>> throughput and utilization for each block device, and network throughput
>> while each task is running.  These metrics are collected by reading the
>> /proc filesystem so work only on Linux.  I'm happy to submit a pull request
>> with the appropriate changes but first wanted to see if sufficiently many
>> people think this would be useful.  I know the metrics reported by Spark
>> (and in the UI) are already overwhelming to some folks so don't want to add
>> more instrumentation if it's not widely useful.
>>
>> These metrics are slightly more difficult to interpret for Spark than
>> similar metrics reported by Hadoop because, with Spark, multiple tasks run
>> in the same JVM and therefore as part of the same process.  This means
>> that, for example, the CPU utilization metrics reflect the CPU use across
>> all tasks in the JVM, rather than only the CPU time used by the particular
>> task.  This is a pro and a con -- it makes it harder to determine why
>> utilization is high (it may be from a different task) but it also makes the
>> metrics useful for diagnosing straggler problems.  Just wanted to clarify
>> this before asking folks to weigh in on whether the added metrics would be
>> useful.
>>
>> -Kay
>>
>> (if you're curious, the instrumentation code is on a very messy branch
>> here:
>>
>> https://github.com/kayousterhout/spark-1/tree/proc_logging_perf_minimal_temp/core/src/main/scala/org/apache/spark/performance_logging
>> )
>>

Re: CPU/Disk/network performance instrumentation

Reply via email to