There has been a lot of discussion on this since it was proposed last
month. I want to know what is currently being proposed given the
lengthy discussion.
Can someone please summarize what the current proposal is now?
Jonathan
Li, Aubrey wrote:
johansen wrote:
On Tue, Jan 12, 2010 at 02:20:02PM +0800, zhihui Chen wrote:
Application can be categoried into CPU-sensitive, Memory-sensitive,
IO-sensitive.
My concern here is that unless the customer knows how to determine
whether his application is CPU, memory, or IO sensitive it's going to be
hard to use the tools well.
"sysload" in NUMAtop can tell the customer if the app is cpu sensitive.
"Last Level Cache Miss per Instruction" will be added into NUMAtop to
determine if the app is memory sensitive.
When CPU trigged one LLC miss, the data can be gotten from local
memory, cache or memory in remote node. Generlly, the latency for
local memory will be close to latency for remote cache, while latency
for remote memory should be much higher.
This isn't universally true. On some SPARC platforms, it actually takes
longer to read a line out of a remote CPU's cache than it does to access
the memory on a remote system board. On a large system, many CPU's may
have this address in their cache, and they all need to know that it has
become owned by the reading CPU. If you're going to make this tool work
on SPARC, it won't always be safe to make this assumption.
-j
Thanks to point this issue out. We are not SPARC expert and I think SPARC
NUMAtop design is not in our phase I design, :)
We hope the SPARC expert like you or other expert can take SPARC into
account and extend this tool onto SPARC platform.
On systems where some remote memory accesses take longer than others,
this could be especially useful. Instead of just reporting the number
of remote accesses, it would be useful to report the amount of time
the
application spent accessing that memory. Then it's possible for the
user to figure out what kind of performance win they might achieve by
making the memory accesses local.
As for the metric of NUMAtop, the memory access latency is a good idea.
But the absolute amount is not a good indicator for NUMAtop. This amount
will be different on different platforms, a specific number of amount is
good on one platform while it's bad on another one. It's hard to tell the
customer what data is good. So we will introduce a ratio into NUMAtop,
"LLC Latency ratio" =
"the actual memory access latency" / "calibrated local memory access latency"
We assume different node hop has different memory access latency, longer
distance node hop has the longer memory access latency. This ratio will be near
to 1 if most of the memory access of the application is to the local memory.
So as a conclusion, here we propose the metrics of NUMAtop
1) sysload - cpu sensitive
2) LLC Miss per Instruction - memory sensitive
3) LLC Latency ratio - memory locality
4) the percent of the number of LMA/RMA access / total memory access
- 4.1) LMA/(total memory access)%
- 4.2) RMA/(total memory access)%
- ...
4.2) could separate into different % onto different NUMA hop.
These parameters are not platform specific and probably be common enough to
extend
to SPARC platform.
Looking forward to your thoughts.
BTW: Do we still need one more +1 vote for NUMAtop project?
Thanks,
-Aubrey
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org