> More cpmments for pmadvise: The product I work on is already shipping something which does the equivalent of a small subset of madvise, so it would be generally useful to us.
I have to echo the other positive comments about the lgrp tools. It's shedding light on a whole area which has previously been difficult to appreciate. The ideal would be to get some idea via the tools of the latency penalty of accessing pages across groups [is the group configuration just statically configured somewhere? If so, could this info be a part of that?], and the logical next step is the amount of time my app is spending waiting to access those pages - but obviously that's a way off. And one specific point... > - "plgrp -G <pid>" returns a number plus an extra blank > line. Why the > blank line? bug? feature? > - Can you add a flag to up the verbosity? Getting just the > lgroup ID > back is handy, I sort-of agree; the first one of these tools I ran was plgrp, and whilst it's my fault for not reading the docs, I initially wondered just what that number was (if you're new to this stuff you could conceivably think it's a CPU id or something). Maybe just a couple of words like "Home lgrp: " (sorry if the terminology's a bit off) in front of it would make it clearer. If the terseness is intentional (I can see it would make scripting easier) feel free to ignore! Getting a bit radical for a minute - with (for example) psrset I can get hold of _all_ processor set bindings using psrset -q. I think it would be useful to be able to get home lgrp information across a number of processes; it's probably not the right place for it, but I'm imagining something like being able to specify the home lgrp as an output column on ps(1). > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Alexander Kolbasov > Sent: 15 November 2005 21:16 > To: perf-discuss@opensolaris.org > Subject: [perf-discuss] Comments from NUMA observability tools > > > We received some internal comments for the NUMA observability > tools. Here they are with my comments separated with // lines: > > - "plgrp -G <pid>" returns a number plus an extra blank > line. Why the > blank line? bug? feature? > > // Bug > > - Can you add a flag to up the verbosity? Getting just the > lgroup ID > back is handy, but it might be nice to get some more info, like what > CPU(s) that lgroup is associated with, what other lgroups are > available, > how much memory each lgroup has access to, etc. Can that be done? > > // This is delegated for lgrpinfo utility which prints all > the information about // lgroups. The plgrp utility > specifically deals with process and thread lgroup // placement. > > - How do I figure out how many lgroups are on the machine? > Can we make > plgrp report that by default instead of the "-h" help output? Seems > like a reasonable place to find this info. > > // This is done by lgrpinfo > > - Can you change the lgroup of <pid> if pbind is used on the same > <pid>? It does not appear that way to me in my testing. > > // It is not possible. Implementation-wise plgrp sets thread > affinity to the // target lgroup which, in the absence of > processor sets and processor bindings // also changes the > "home" lgroup. The processor sets and processor bindings // > assignments are considered first, so while plgrp can > successfully set thread // affinities it can't actually set > the home in such cases. The specified // affinities will > start playing when a thread is unbound or moved outside a // > processor set. > > IMPORTANT: Side bar conversation... It would be very handy to > be able to > bind a process to a particular CPU and then have all of its memory be > local to a different CPU, for some of the testing I am doing > on G4. Is > there a way in S10 that I can do this? numactl in linux can > do that. I > could really use it on Solaris, right now... > > // There is no way of doing exactly this. > > - Apparently, you can tell plgrp to set to pid to an lgrp > value and it > does not complain if it fails... > > // This is a bug that should be fixed. > > # pgrep lat > 14894 > 14866 > # ./plgrp -G 14866 > 1 > > # ./plgrp -S 2 14866 > # ./plgrp -G 14866 > 1 > > It should give you some sort of indication that something was > not done, > and a hopefully a little bit on why. Right? > > // Right > > In fact, you can try some ridiculous values without a peep out of it > (unless I miss why 10 or 20 make sense), and nothing changes. > > # pgrep lat_mem > 14821 > 14711 > # ./plgrp -S 4 14711 > # ./plgrp -S 10 14711 > # ./plgrp -S 20 14711 > # ./plgrp -G 14711 > 1 > > # ./plgrp -S 20 14711 > # ./plgrp -G 14711 > 1 > > // This is a bug. > > Here is another bunch of comments: > > I've been applying the tools to a few problems and benchmarks > I've been involved with over the last few days and I intend > to keep on applying them to what I can from now on. Overall, > I think that they are excellent tools and exactly what we > need. I'll follow up with more details, thoughts and results > in the next few days. > > Just having the ability to observe lgroup topology and usage > is a huge step forward and opens up a whole area of > investigations that were, up to now, fairly closed off. > > >>IMPORTANT: Side bar conversation... It would be very handy > to be able > >>to > >>bind a process to a particular CPU and then have all of its > memory be > >>local to a different CPU, for some of the testing I am > doing on G4. Is > >>there a way in S10 that I can do this? numactl in linux > can do that. I > >>could really use it on Solaris, right now... > >> > >> > > > >This is interesting. Can you explain why you would like such > >functionality? > > > > > > I've been doing exactly this today with some experimentation > with the STREAM benchmark.I think this would work (heap example): > > - Change the thread in questions home lgroup to the CPU where > you want the > memory allocated, > - use 'pmadvise -o heap=lwp_access' on the process > - the memory should now get allocated in the newly homed lgrp > (check with 'pmap -L' and lgrpinfo (if the allocation is > large enough to > notice)). > - rehome the lgrp to another CPU or just bind it. > > This is how I was doing it and there may well be other/better > ways - I'm all ears. However, this is all a bit unwieldy and > I would very much like to be able to say, thread X's heap > should be allocated from lgroup Y, fairly much like the > 'migrate range' option of SGI's dplace(1) commnd language. > Note that I've never used dplace(1) but I do work with ex > SGI'ers who speak well of it. It certainly looks to be powerful stuff. > > There was also a question: > > I grabbed the ptools-bin-0.1.2.tar.gz off of opensolaris.org, and > lgrpinfo was not in it. I had seen it before. Do you plan > to include > it in that tar.gz? > > // It is distributed separately from > // > http://www.opensolaris.org/os/community/performance/numa/obser vability/perllgrp/ // or via CPAN at http://search.cpan.org/dist/Solaris-Lgrp/ More cpmments for pmadvise: It would be nice to apply memory placement advice from the start to a process. There is a nice DTrace example using system() calls to apply pmadvise to a just started process. This is fine but a bit messy. A nice option would be to make pmadise a libproc consumer and have it exec the target program. In this way we could possibly do things such as : pmadvise -o heap=access_lwp '/path/to/command -flags' Then again, how about having a control file along the lines of the way we do mpss. In it we could specify policy to apply to a range of processes and apply it via a preloader. e.g: oracle*:heap=access_lwp,stack=access_lwp // Please see madv.so.1(1) for this kind of functionality __ Compiled by Alex Kolbasov _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org