On Thu, Dec 17, 2009 at 05:07:45PM -0800, Jin Yao wrote: > We decide to divide numatop development into 2 phases. > > At phase 1, numatop is designed as a memory locality characterizing tool. > It provides a easy and friendly way for observing performance data from > hardware performance counters. Otherwise these data are difficult to > interpret. > > At phase 2, we hope numatop can provide some clues for the relationship > between memory allocation, thread migration and memory access.
I'm concerned that unless we're able to demonstrate some causal relationship between RMA and reduced performance, it will be hard for customers to use the tools to diagnose problems. Imagine a situation where the application is running slowly and RMA is not the cause, but the tool shows high RMA. In such a case NUMAtop could add to the difficulty of diagnosing the root cause of the customer's problem. We should also exercise care in choosing the type of metric that we report, as some turn out to be meaniningless. Percent of CPU spent waiting for I/O is a good example of a meaningless metric. > From the output of numatop prototype, We often find one case that a > thread has high RMA but low LMA, while all NUMA nodes (lgroup leaf > nodes) have enough free memory. Why the thread doesn't allocate more > memory on it's home affinity node? > > If it's due to memory allocating after the thread was migrated to > another node and then allocate memory on the new node? It could be due to this, but there are many other possibilities too. Perhaps the MEM_POLICY was set to NEXT_CPU, and the page fault occurred on a CPU that belongs to a different lgrp than the process's home lgroup. It's also possible that page_create_va() wasn't able to find a page of the right color at a mnode that belongs to the caller's home lgroup. In that situation, page_create may create a page from a remote lgroup before checking the cachelist. (This depends on the platform, and whether PG_LOCAL is set). Perhaps a large-page request couldn't be satisfied from the mnode of the home lgroup that requested the allocation. I'm sure that there are other examples I'm missing, too. > So in phase 2, we will focus on to find the relationship between the > memory access pattern, memory allocation pattern and schedule to get a > good memory strategy for application. > > In brief, numatop phase 1 let the users see something strange in the > system, and in phase 2, numatop try to provide some clues. I'm willing to vote for sponsorship, provided that these issues are addressed and planned for prior to integration with ON. I'm assuming that's what you're targeting once the prototype work has been finished, correct? You'll also need two more +1 votes from performance core contributors. Thanks, -j _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org