[perf-discuss] Comments from NUMA observability tools

Alexander Kolbasov Tue, 15 Nov 2005 13:16:28 -0800

We received some internal comments for the NUMA observability tools. Here they
are with my comments separated with // lines:


 - "plgrp -G <pid>" returns a number plus an extra blank line.  Why the 
blank line?  bug? feature?

// Bug

 - Can you add a flag to up the verbosity?  Getting just the lgroup ID 
back is handy, but it might be nice to get some more info, like what 
CPU(s) that lgroup is associated with, what other lgroups are available, 
how much memory each lgroup has access to, etc.  Can that be done?

// This is delegated for lgrpinfo utility which prints all the information about
// lgroups. The plgrp utility specifically deals with process and thread lgroup
// placement.

 - How do I figure out how many lgroups are on the machine?  Can we make 
plgrp report that by default instead of the "-h" help output?  Seems 
like a reasonable place to find this info.

// This is done by lgrpinfo

 - Can you change the lgroup of <pid> if pbind is used on the same 
<pid>?  It does not appear that way to me in my testing.  

// It is not possible. Implementation-wise plgrp sets thread affinity to the
// target lgroup which, in the absence of processor sets and processor bindings
// also changes the "home" lgroup. The processor sets and processor bindings
// assignments are considered first, so while plgrp can successfully set thread
// affinities it can't actually set the home in such cases. The specified
// affinities will start playing when a thread is unbound or moved outside a
// processor set. 

IMPORTANT: Side bar conversation... It would be very handy to be able to 
bind a process to a particular CPU and then have all of its memory be 
local to a different CPU, for some of the testing I am doing on G4.  Is 
there a way in S10 that I can do this?  numactl in linux can do that.  I 
could really use it on Solaris, right now...

// There is no way of doing exactly this.

- Apparently, you can tell plgrp to set to pid to an lgrp value and it 
does not complain if it fails...

// This is a bug that should be fixed.

# pgrep lat
14894
14866
# ./plgrp -G 14866
1

# ./plgrp -S 2 14866
# ./plgrp -G 14866
1

It should give you some sort of indication that something was not done, 
and a hopefully a little bit on why.  Right?

// Right

In fact, you can try some ridiculous values without a peep out of it 
(unless I miss why 10 or 20 make sense), and nothing changes.

# pgrep lat_mem
14821
14711
# ./plgrp -S 4 14711
# ./plgrp -S 10 14711
# ./plgrp -S 20 14711
# ./plgrp -G 14711
1

# ./plgrp -S 20 14711
# ./plgrp -G 14711
1

// This is a bug.

Here is another bunch of comments:

I've been applying the tools to a few problems and benchmarks I've been involved
with over the last few days and I intend to keep on applying them to what I can
from now on. Overall, I think that they are excellent tools and exactly what we
need. I'll follow up with more details, thoughts and results in the next few
days.

Just having the ability to observe lgroup  topology and usage is a huge
step forward and opens up a whole area of investigations that were, up to
now, fairly closed off.

>>IMPORTANT: Side bar conversation... It would be very handy to be able to 
>>bind a process to a particular CPU and then have all of its memory be 
>>local to a different CPU, for some of the testing I am doing on G4.  Is 
>>there a way in S10 that I can do this?  numactl in linux can do that.  I 
>>could really use it on Solaris, right now...
>>    
>>
>
>This is interesting. Can you explain why you would like such functionality?
>  
>

I've been doing exactly this today with some experimentation with the STREAM
benchmark.I think this would work (heap example):

- Change the thread in questions home lgroup to the CPU where you want the
  memory allocated,
- use 'pmadvise -o heap=lwp_access' on the process
- the memory should now get allocated in the newly homed lgrp
  (check with 'pmap -L' and lgrpinfo (if the allocation is large enough to 
  notice)).
- rehome the lgrp to another CPU or just bind it.

This is how I was doing it and there may well be other/better ways - I'm all
ears. However, this is all a bit unwieldy and I would very much like to be able
to say, thread X's heap should be allocated from lgroup Y, fairly much like the
'migrate range' option of SGI's dplace(1) commnd language. Note that I've never
used dplace(1) but I do work with ex SGI'ers who speak well of it. It certainly
looks to be powerful stuff.

There was also a question:

I grabbed the ptools-bin-0.1.2.tar.gz off of opensolaris.org, and 
lgrpinfo was not in it.  I had seen it before.  Do you plan to include 
it in that tar.gz?

// It is distributed separately from
// 
http://www.opensolaris.org/os/community/performance/numa/observability/perllgrp/
// or via CPAN at http://search.cpan.org/dist/Solaris-Lgrp/

More cpmments for pmadvise:

It would be nice to apply memory placement advice from the start
to a process. There is a nice DTrace example using system() calls
to apply pmadvise to a just started process.

This is fine but a bit messy. A nice option would be to make pmadise a
libproc consumer and have it exec the target program. In this way we could
possibly do things such as :

pmadvise -o heap=access_lwp '/path/to/command -flags'

Then again, how about having a control file along the lines of the
way we do mpss. In it we could specify policy to apply to a range
of processes and apply it via a preloader. e.g:

oracle*:heap=access_lwp,stack=access_lwp

// Please see madv.so.1(1) for this kind of functionality

__
Compiled by Alex Kolbasov


_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

[perf-discuss] Comments from NUMA observability tools

Reply via email to