We received some internal comments for the NUMA observability tools. Here they are with my comments separated with // lines:
- "plgrp -G <pid>" returns a number plus an extra blank line. Why the blank line? bug? feature? // Bug - Can you add a flag to up the verbosity? Getting just the lgroup ID back is handy, but it might be nice to get some more info, like what CPU(s) that lgroup is associated with, what other lgroups are available, how much memory each lgroup has access to, etc. Can that be done? // This is delegated for lgrpinfo utility which prints all the information about // lgroups. The plgrp utility specifically deals with process and thread lgroup // placement. - How do I figure out how many lgroups are on the machine? Can we make plgrp report that by default instead of the "-h" help output? Seems like a reasonable place to find this info. // This is done by lgrpinfo - Can you change the lgroup of <pid> if pbind is used on the same <pid>? It does not appear that way to me in my testing. // It is not possible. Implementation-wise plgrp sets thread affinity to the // target lgroup which, in the absence of processor sets and processor bindings // also changes the "home" lgroup. The processor sets and processor bindings // assignments are considered first, so while plgrp can successfully set thread // affinities it can't actually set the home in such cases. The specified // affinities will start playing when a thread is unbound or moved outside a // processor set. IMPORTANT: Side bar conversation... It would be very handy to be able to bind a process to a particular CPU and then have all of its memory be local to a different CPU, for some of the testing I am doing on G4. Is there a way in S10 that I can do this? numactl in linux can do that. I could really use it on Solaris, right now... // There is no way of doing exactly this. - Apparently, you can tell plgrp to set to pid to an lgrp value and it does not complain if it fails... // This is a bug that should be fixed. # pgrep lat 14894 14866 # ./plgrp -G 14866 1 # ./plgrp -S 2 14866 # ./plgrp -G 14866 1 It should give you some sort of indication that something was not done, and a hopefully a little bit on why. Right? // Right In fact, you can try some ridiculous values without a peep out of it (unless I miss why 10 or 20 make sense), and nothing changes. # pgrep lat_mem 14821 14711 # ./plgrp -S 4 14711 # ./plgrp -S 10 14711 # ./plgrp -S 20 14711 # ./plgrp -G 14711 1 # ./plgrp -S 20 14711 # ./plgrp -G 14711 1 // This is a bug. Here is another bunch of comments: I've been applying the tools to a few problems and benchmarks I've been involved with over the last few days and I intend to keep on applying them to what I can from now on. Overall, I think that they are excellent tools and exactly what we need. I'll follow up with more details, thoughts and results in the next few days. Just having the ability to observe lgroup topology and usage is a huge step forward and opens up a whole area of investigations that were, up to now, fairly closed off. >>IMPORTANT: Side bar conversation... It would be very handy to be able to >>bind a process to a particular CPU and then have all of its memory be >>local to a different CPU, for some of the testing I am doing on G4. Is >>there a way in S10 that I can do this? numactl in linux can do that. I >>could really use it on Solaris, right now... >> >> > >This is interesting. Can you explain why you would like such functionality? > > I've been doing exactly this today with some experimentation with the STREAM benchmark.I think this would work (heap example): - Change the thread in questions home lgroup to the CPU where you want the memory allocated, - use 'pmadvise -o heap=lwp_access' on the process - the memory should now get allocated in the newly homed lgrp (check with 'pmap -L' and lgrpinfo (if the allocation is large enough to notice)). - rehome the lgrp to another CPU or just bind it. This is how I was doing it and there may well be other/better ways - I'm all ears. However, this is all a bit unwieldy and I would very much like to be able to say, thread X's heap should be allocated from lgroup Y, fairly much like the 'migrate range' option of SGI's dplace(1) commnd language. Note that I've never used dplace(1) but I do work with ex SGI'ers who speak well of it. It certainly looks to be powerful stuff. There was also a question: I grabbed the ptools-bin-0.1.2.tar.gz off of opensolaris.org, and lgrpinfo was not in it. I had seen it before. Do you plan to include it in that tar.gz? // It is distributed separately from // http://www.opensolaris.org/os/community/performance/numa/observability/perllgrp/ // or via CPAN at http://search.cpan.org/dist/Solaris-Lgrp/ More cpmments for pmadvise: It would be nice to apply memory placement advice from the start to a process. There is a nice DTrace example using system() calls to apply pmadvise to a just started process. This is fine but a bit messy. A nice option would be to make pmadise a libproc consumer and have it exec the target program. In this way we could possibly do things such as : pmadvise -o heap=access_lwp '/path/to/command -flags' Then again, how about having a control file along the lines of the way we do mpss. In it we could specify policy to apply to a range of processes and apply it via a preloader. e.g: oracle*:heap=access_lwp,stack=access_lwp // Please see madv.so.1(1) for this kind of functionality __ Compiled by Alex Kolbasov _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org