Eric, Bart, and Jonathan,

Thanks for your quick replies. See my comments below:

On 9/16/05, jonathan chew <[EMAIL PROTECTED]> wrote:
Marc Rocas wrote:

> I've been playing around with the tools on a Stinger box and I think
> their pretty cool!


I'm glad that you like them.  We like them too and think that they are
fun to play with besides being useful for observability and
experimenting with performance.  I hope that our MPO overview, man
pages, and our examples on how to use the tools were helpful.

All of the above were pretty useful in getting up to speed and I will use them to do performance experiments. I simply need to bring up our system in the stinger box in order to run our perf suites.

> The one question I have is whether ISM (Intimate Shared Memory)
> segments are immune to being coerced to relocate via pmadvise(3c)?
> I've tried it without success. A quick look at the seg_spt.c code
> seemed to indicate that when an spt segment is created its lgroup
> policy is set to LGRP_MEM_POLICY_DEFAULT that will result in
> randomized allocation for segments greater than 8MB. I've verified as
> much using the NUMA enhanced pmap(3c) command.


It sounds like Eric Lowe has a theory as to why madvise(MADV_ACCESS_*)
and pmadvise(1) didn't work for migrating your ISM segment.  Joe and
Nils are experts on the x86/AMD64 HAT and may be able to comment on
Eric's theory that the lack of dynamic ISM unmap is preventing page
migration from working.

I'll see whether I can reproduce the problem.

Which version of Solaris are you using?

Solaris 10 GA with no kernel patches applied.

> I have an ISM segment that gets consumed by HW that is limited to
> 32-bit addressing and thus have a need to control the physical range
> that backs the segment. At this point, it would seem that I need to
> allocate the memory (about 300MB) in the kernel and map it back to
> user-land but I would lose the use of 2MB pages since I have not quite
> figured out how to allocate memory using a particular page size. Have
> I misinterpreted the code? Do I have other options?


Bart's suggestion is the simplest (brute force) way.  Eric's suggestion
sounded a little painful.  I have a couple of other options below, but
can't think of a nice way to specify that your application needs low
physical memory.  So, I want to understand what you are doing better to
see if there is any better way.

Can you please tell me more about your situation and requirements?  What
is the hardware that needs the 32-bit addressing (framebuffer, device
needing DMA, etc.)?  Besides needing to be in low physical memory, does
it need to be wired down and shared?

A device needing DMA. It needs to be wired down as well as shared. We're running fine with Bart's suggestion but still need a way to park the segment in the lower 4GB PA range since we actually want to run experiments with a minimum of 8GB of RAM.

Jonathan

PS
Assuming that you can change your code, here are a couple of other
options that are less painful than Eric's suggestion but still aren't
very elegant because of the low physical memory requirement:

- Use DISM instead of ISM which is Dynamic Intimate Shared Memory and is
pageable (see SHM_PAGEABLE flag to shmat(2))

OR

- Use  mmap(2) and the MAP_ANON (and MAP_SHARED if you need shared
memory) flag to allocate (shared) anonymous memory

- Call memcntl(2) with MC_HAT_ADVISE to specify that you want large pages

AND

- Call madvise(MADV_ACCESS_LWP) on your mmap-ed or DISM segment to say
that the next thread to access it will use it a lot

- Access it from CPU 0 on your Stinger.  I don't like this part because
this is hardware implementation specific.  It turns out that the low
physical memory usually lives near CPU 0 on an Opteron box.  You can use
liblgrp(3LIB) to discover which leaf lgroup contains CPU 0 and
lgrp_affinity_set(3LGRP) to set a strong affinity for that lgroup (which
will set the home lgroup for the thread to that lgroup).  Alternatively,
you can use processor_bind(2) to bind/unbind to CPU 0.

- Use the MC_LOCK flag to memcntl(2) to lock down the memory for your
segment if you want the physical memory to stay there (until you unlock it)

I will try using DISM, madvise, and processor_bind as it will allow me to avoid changing the value of grp_shm_random_thresh.

Lastly, I tried relocating the memory by following the instructions in Alexander Kolbasov's blog (Memory Placement) by writing a simple app that attached to the existing segment with a few sleep() calls to allow me to type the following commands:

# pmap -Ls $pid | fgrep "ism shimd=0x0"
E9E00000    2048K    2M  rwxsR    1   [ ism shmid=0x0 ]
...
# plgrp -S 1 $pid
# pmap -o E9E00000=access_lwp $pid

An initial sleep of 3 minutes to get the above commands done and then the app does a few writes to the segment and sleeps for another minute to allow me to invoke pmap -Ls and verify wether the segment migrated or not.

I'll try the experiments on Monday when I'm back at the office.

Again, thanks for all the help.

--Marc


_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to