Re: [perf-discuss] NUMAtop for OpenSolaris

johansen Mon, 04 Jan 2010 14:37:16 -0800

On Thu, Dec 17, 2009 at 05:07:45PM -0800, Jin Yao wrote:
> We decide to divide numatop development into 2 phases. 
> 
> At phase 1, numatop is designed as a memory locality characterizing tool. 
> It provides a easy and friendly way for observing performance data from 
> hardware performance counters. Otherwise these data are difficult to 
> interpret.
> 
> At phase 2, we hope numatop can provide some clues for the relationship 
> between memory allocation, thread migration and memory access.


I'm concerned that unless we're able to demonstrate some causal
relationship between RMA and reduced performance, it will be hard for
customers to use the tools to diagnose problems.  Imagine a situation
where the application is running slowly and RMA is not the cause, but
the tool shows high RMA.  In such a case NUMAtop could add to the
difficulty of diagnosing the root cause of the customer's problem.

We should also exercise care in choosing the type of metric that we
report, as some turn out to be meaniningless.  Percent of CPU spent
waiting for I/O is a good example of a meaningless metric.

> From the output of numatop prototype, We often find one case that a
> thread has high RMA but low LMA, while all NUMA nodes (lgroup leaf
> nodes) have enough free memory. Why the thread doesn't allocate more
> memory on it's home affinity node? 
> 
> If it's due to memory allocating after the thread was migrated to
> another node and then allocate memory on the new node?

It could be due to this, but there are many other possibilities too.
Perhaps the MEM_POLICY was set to NEXT_CPU, and the page fault occurred
on a CPU that belongs to a different lgrp than the process's home
lgroup.

It's also possible that page_create_va() wasn't able to find a page of
the right color at a mnode that belongs to the caller's home lgroup.  In
that situation, page_create may create a page from a remote lgroup
before checking the cachelist.  (This depends on the platform, and
whether PG_LOCAL is set).

Perhaps a large-page request couldn't be satisfied from the mnode of the
home lgroup that requested the allocation.

I'm sure that there are other examples I'm missing, too.

> So in phase 2, we will focus on to find the relationship between the
> memory access pattern, memory allocation pattern and schedule to get a
> good memory strategy for application.
> 
> In brief, numatop phase 1 let the users see something strange in the
> system, and in phase 2, numatop try to provide some clues. 

I'm willing to vote for sponsorship, provided that these issues are
addressed and planned for prior to integration with ON.  I'm assuming
that's what you're targeting once the prototype work has been finished,
correct?

You'll also need two more +1 votes from performance core contributors.

Thanks,

-j
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] NUMAtop for OpenSolaris

Reply via email to