On 9/16/05, *Marc Rocas* <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>> wrote:
Eric, Bart, and Jonathan,
Thanks for your quick replies. See my comments below:
On 9/16/05, *jonathan chew* < [EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>> wrote:
Marc Rocas wrote:
I've been playing around with the tools on a Stinger box and
I think
their pretty cool!
I'm glad that you like them. We like them too and think that
they are
fun to play with besides being useful for observability and
experimenting with performance. I hope that our MPO overview,
man
pages, and our examples on how to use the tools were helpful.
All of the above were pretty useful in getting up to speed and I
will use them to do performance experiments. I simply need to
bring up our system in the stinger box in order to run our perf
suites.
The one question I have is whether ISM (Intimate Shared Memory)
segments are immune to being coerced to relocate via
pmadvise(3c)?
I've tried it without success. A quick look at the seg_spt.c code
seemed to indicate that when an spt segment is created its lgroup
policy is set to LGRP_MEM_POLICY_DEFAULT that will result in
randomized allocation for segments greater than 8MB. I've
verified as
much using the NUMA enhanced pmap(3c) command.
It sounds like Eric Lowe has a theory as to why
madvise(MADV_ACCESS_*)
and pmadvise(1) didn't work for migrating your ISM
segment. Joe and
Nils are experts on the x86/AMD64 HAT and may be able to
comment on
Eric's theory that the lack of dynamic ISM unmap is preventing
page
migration from working.
I'll see whether I can reproduce the problem.
Which version of Solaris are you using?
Solaris 10 GA with no kernel patches applied.
I have an ISM segment that gets consumed by HW that is
limited to
32-bit addressing and thus have a need to control the
physical range
that backs the segment. At this point, it would seem that I
need to
allocate the memory (about 300MB) in the kernel and map it
back to
user-land but I would lose the use of 2MB pages since I have
not quite
figured out how to allocate memory using a particular page
size. Have
I misinterpreted the code? Do I have other options?
Bart's suggestion is the simplest (brute force) way. Eric's
suggestion
sounded a little painful. I have a couple of other options
below, but
can't think of a nice way to specify that your application
needs low
physical memory. So, I want to understand what you are doing
better to
see if there is any better way.
Can you please tell me more about your situation and
requirements? What
is the hardware that needs the 32-bit addressing (framebuffer,
device
needing DMA, etc.)? Besides needing to be in low physical
memory, does
it need to be wired down and shared?
A device needing DMA. It needs to be wired down as well as shared.
We're running fine with Bart's suggestion but still need a way to
park the segment in the lower 4GB PA range since we actually want
to run experiments with a minimum of 8GB of RAM.
Jonathan
PS
Assuming that you can change your code, here are a couple of
other
options that are less painful than Eric's suggestion but still
aren't
very elegant because of the low physical memory requirement:
- Use DISM instead of ISM which is Dynamic Intimate Shared
Memory and is
pageable (see SHM_PAGEABLE flag to shmat(2))
OR
- Use mmap(2) and the MAP_ANON (and MAP_SHARED if you need shared
memory) flag to allocate (shared) anonymous memory
- Call memcntl(2) with MC_HAT_ADVISE to specify that you want
large pages
AND
- Call madvise(MADV_ACCESS_LWP) on your mmap-ed or DISM
segment to say
that the next thread to access it will use it a lot
- Access it from CPU 0 on your Stinger. I don't like this
part because
this is hardware implementation specific. It turns out that
the low
physical memory usually lives near CPU 0 on an Opteron
box. You can use
liblgrp(3LIB) to discover which leaf lgroup contains CPU 0 and
lgrp_affinity_set(3LGRP) to set a strong affinity for that
lgroup (which
will set the home lgroup for the thread to that
lgroup). Alternatively,
you can use processor_bind(2) to bind/unbind to CPU 0.
- Use the MC_LOCK flag to memcntl(2) to lock down the memory
for your
segment if you want the physical memory to stay there (until
you unlock it)
I will try using DISM, madvise, and processor_bind as it will
allow me to avoid changing the value of grp_shm_random_thresh.
Lastly, I tried relocating the memory by following the
instructions in Alexander Kolbasov's blog (Memory Placement) by
writing a simple app that attached to the existing segment with a
few sleep() calls to allow me to type the following commands:
# pmap -Ls $pid | fgrep "ism shimd=0x0"
E9E00000 2048K 2M rwxsR 1 [ ism shmid=0x0 ]
...
# plgrp -S 1 $pid
# pmap -o E9E00000=access_lwp $pid
An initial sleep of 3 minutes to get the above commands done and
then the app does a few writes to the segment and sleeps for
another minute to allow me to invoke pmap -Ls and verify wether
the segment migrated or not.
I'll try the experiments on Monday when I'm back at the office.
Again, thanks for all the help.
--Marc