On Tue, Jan 05, 2010 at 04:27:03PM +0800, Li, Aubrey wrote: > >I'm concerned that unless we're able to demonstrate some causal > >relationship between RMA and reduced performance, it will be hard for > >customers to use the tools to diagnose problems. Imagine a situation > >where the application is running slowly and RMA is not the cause, but > >the tool shows high RMA. In such a case NUMAtop could add to the > >difficulty of diagnosing the root cause of the customer's problem. > > If an application has reduced performance and high RMA, high RMA at least > should be one part of the cause, Unless we can tell the customer the app > has to allocate memory from a remote node.
I don't think it's necessarily safe to conclude that. If an application is memory-bound and has RMA, I agree. However, if the application is CPU or I/O bound, the performance problem might not be due to RMA -- espeically in the I/O case. To use lockstat as an example: some customers run this tool, notice numbers that look high to them, and then escalate. In many of these cases, there's not actually a scalability problem but the tool can make it easy to conclude that one might exist. I'm looking to avoid that, if we possibly can. > >We should also exercise care in choosing the type of metric that we > >report, as some turn out to be meaniningless. Percent of CPU spent > >waiting for I/O is a good example of a meaningless metric. > > > > The metric is important to show the report. Now we are using the RMA# > as the ordering rule. In order to show how effective the application is > using the memory, we probably could use RMA#, LMA# and sysload together. Again to use lockstat, but this time as a positive example. It initially used number of spins, when busy-waiting for a lock. This makes it hard for the user to determine how much time is actually being lost to spinning for a lock. The tool was recently changed to report the amount of time spent spinnning, which is easier to understand and more meaningful measurement. On systems where some remote memory accesses take longer than others, this could be especially useful. Instead of just reporting the number of remote accesses, it would be useful to report the amount of time the application spent accessing that memory. Then it's possible for the user to figure out what kind of performance win they might achieve by making the memory accesses local. -j _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org