Re: [perf-discuss] NUMAtop for OpenSolaris

Li, Aubrey Tue, 05 Jan 2010 00:28:40 -0800

Johansen wrote:
>
>On Thu, Dec 17, 2009 at 05:07:45PM -0800, Jin Yao wrote:
>> We decide to divide numatop development into 2 phases.
>>
>> At phase 1, numatop is designed as a memory locality characterizing
>tool.
>> It provides a easy and friendly way for observing performance data
>from
>> hardware performance counters. Otherwise these data are difficult to
>> interpret.
>>
>> At phase 2, we hope numatop can provide some clues for the
>relationship
>> between memory allocation, thread migration and memory access.
>
>I'm concerned that unless we're able to demonstrate some causal
>relationship between RMA and reduced performance, it will be hard for
>customers to use the tools to diagnose problems.  Imagine a situation
>where the application is running slowly and RMA is not the cause, but
>the tool shows high RMA.  In such a case NUMAtop could add to the
>difficulty of diagnosing the root cause of the customer's problem.


If an application has reduced performance and high RMA, high RMA at least
should be one part of the cause, Unless we can tell the customer the app
has to allocate memory from a remote node.

>
>We should also exercise care in choosing the type of metric that we
>report, as some turn out to be meaniningless.  Percent of CPU spent
>waiting for I/O is a good example of a meaningless metric.
>

The metric is important to show the report. Now we are using the RMA#
as the ordering rule. In order to show how effective the application is
using the memory, we probably could use RMA#, LMA# and sysload together.

>
>> From the output of numatop prototype, We often find one case that a
>> thread has high RMA but low LMA, while all NUMA nodes (lgroup leaf
>> nodes) have enough free memory. Why the thread doesn't allocate more
>> memory on it's home affinity node?
>>
>> If it's due to memory allocating after the thread was migrated to
>> another node and then allocate memory on the new node?
>
>It could be due to this, but there are many other possibilities too.
>Perhaps the MEM_POLICY was set to NEXT_CPU, and the page fault occurred
>on a CPU that belongs to a different lgrp than the process's home
>lgroup.
>
>It's also possible that page_create_va() wasn't able to find a page of
>the right color at a mnode that belongs to the caller's home lgroup.  In
>that situation, page_create may create a page from a remote lgroup
>before checking the cachelist.  (This depends on the platform, and
>whether PG_LOCAL is set).
>
>Perhaps a large-page request couldn't be satisfied from the mnode of the
>home lgroup that requested the allocation.
>
>I'm sure that there are other examples I'm missing, too.
>

These cases are interesting and it looks like we need more work to show more
valuable points in NUMAtop.

Thanks,
-Aubrey

>> So in phase 2, we will focus on to find the relationship between the
>> memory access pattern, memory allocation pattern and schedule to get a
>> good memory strategy for application.
>>
>> In brief, numatop phase 1 let the users see something strange in the
>> system, and in phase 2, numatop try to provide some clues.
>
>I'm willing to vote for sponsorship, provided that these issues are
>addressed and planned for prior to integration with ON.  I'm assuming
>that's what you're targeting once the prototype work has been finished,
>correct?
>
>You'll also need two more +1 votes from performance core contributors.
>
>Thanks,
>
>-j
>_______________________________________________
>perf-discuss mailing list
>perf-discuss@opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] NUMAtop for OpenSolaris

Reply via email to