On Tue, Jan 05, 2010 at 04:27:03PM +0800, Li, Aubrey wrote:
> >I'm concerned that unless we're able to demonstrate some causal
> >relationship between RMA and reduced performance, it will be hard for
> >customers to use the tools to diagnose problems.  Imagine a situation
> >where the application is running slowly and RMA is not the cause, but
> >the tool shows high RMA.  In such a case NUMAtop could add to the
> >difficulty of diagnosing the root cause of the customer's problem.
> 
> If an application has reduced performance and high RMA, high RMA at least
> should be one part of the cause, Unless we can tell the customer the app
> has to allocate memory from a remote node.

I don't think it's necessarily safe to conclude that.  If an application
is memory-bound and has RMA, I agree.  However, if the application is
CPU or I/O bound, the performance problem might not be due to RMA --
espeically in the I/O case.

To use lockstat as an example: some customers run this tool, notice
numbers that look high to them, and then escalate.  In many of these
cases, there's not actually a scalability problem but the tool can make
it easy to conclude that one might exist.  I'm looking to avoid that, if
we possibly can.

> >We should also exercise care in choosing the type of metric that we
> >report, as some turn out to be meaniningless.  Percent of CPU spent
> >waiting for I/O is a good example of a meaningless metric.
> >
> 
> The metric is important to show the report. Now we are using the RMA#
> as the ordering rule. In order to show how effective the application is
> using the memory, we probably could use RMA#, LMA# and sysload together.

Again to use lockstat, but this time as a positive example.  It
initially used number of spins, when busy-waiting for a lock.  This
makes it hard for the user to determine how much time is actually being
lost to spinning for a lock.  The tool was recently changed to report
the amount of time spent spinnning, which is easier to understand and
more meaningful measurement.

On systems where some remote memory accesses take longer than others,
this could be especially useful.  Instead of just reporting the number
of remote accesses, it would be useful to report the amount of time the
application spent accessing that memory.  Then it's possible for the
user to figure out what kind of performance win they might achieve by
making the memory accesses local.

-j
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to