On Thu, Sep 15, 2005 at 11:55:21PM -0400, Marc Rocas wrote: | | The one question I have is whether ISM (Intimate Shared Memory) segments are | immune to being coerced to relocate via pmadvise(3c)? I've tried it without | success. A quick look at the seg_spt.c code seemed to indicate that when an | spt segment is created its lgroup policy is set to LGRP_MEM_POLICY_DEFAULT | that will result in randomized allocation for segments greater than 8MB. | I've verified as much using the NUMA enhanced pmap(3c) command.
You are correct about the policy. Since shared memory is usually, uh, shared :) we spread it around to prevent hot-spotting. segspt_shmadvise() implements the NUMA migration policies for all shared memory types so next touch (MADV_ACCESS_LWP) should work. Unfortunately, the x64 HAT layer does not not implement dynamic ISM unmap. Since the NUMA migration code is driven by minor faults (the definition of next touch depends on that) I suspect that is why madvise() does not work for ISM on that machine. | I have an ISM segment that gets consumed by HW that is limited to 32-bit | addressing and thus have a need to control the physical range that backs the | segment. At this point, it would seem that I need to allocate the memory | (about 300MB) in the kernel and map it back to user-land but I would lose | the use of 2MB pages since I have not quite figured out how to allocate | memory using a particular page size. Have I misinterpreted the code? Do I | have other options? The twisted maze of code for ISM allocates its memory (ironically, since it's always locked and hence doesn't need swap) through anon. There is no way to restrict the segment to a specific PA range or otherwise impact the allocation path until the pages are already (F_SOFTLOCK) faulted in. The NUMA code might help in some cases because the first lgrp happens to fall under 4G :) so you can probably hack your way through this at the application layer by changing the shared memory random threshold to ULONG_MAX in /etc/system, binding your application thread to a CPU in the correct lgroup which has memory only below 4G, and then doing the shmat() from there. It's a ginormous hack, but it will get you the results you want. :) Once sysV shared memory is redesigned to not rely on the anon layer doing this sort of thing in the kernel should become a lot easier. Being able to specify where user memory ends up in PA space to avoid copying it on I/O is an RFE we simply have never thought about before. The lgroup code is careful to avoid specifics like memory addresses, since lgrp-physical mappings are very machine specific, so making this sort of thing work would require adding a whole new set of segment advice ops which are used by the physical memory allocator itself; page_create_va() takes the segment as one of its arguments, so if we stuffed the segment PA range advice into the segment we could dig it up down there and request memory in the "correct" range from freelists. -- Eric Lowe Solaris Kernel Development Austin, Texas Sun Microsystems. We make the net work. x64155/+1(512)401-1155 _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org