On Sun, Apr 05, 2009 at 01:51:44PM +0200, Hans Ottevanger wrote: > Hi folks, > > As has been noted before, there is an issue with the mlockall() system > call always failing on (at least) the amd64 architecture. This is quite > evident by the automounter (as configured out-of-the-box) printing error > messages on startup like: > > Couldn't lock process pages in memory using mlockall() > > I have verified the occurrence of this issue on the amd64 platform on > 7.1-STABLE and 8.0-CURRENT. On the i386 platform this problem does not > occur. > > To investigate this issue a bit further I ran the following trivial program: > > #include <stdio.h> > #include <stdlib.h> > #include <unistd.h> > #include <sys/mman.h> > > int main(int argc, char *argv[]) > { > if (mlockall(MCL_CURRENT|MCL_FUTURE) == -1) > perror(argv[0]); > > char command[80]; > snprintf(command, 80, "procstat -v %d", getpid()); > system(command); > > exit(0); > } > > which yields (using CURRENT-8.0 as of today, on an Intel DP965LT board > with a Q6600 and 8 Gbyte RAM, GENERIC kernel stripped of unused devices, > output folded to 72 characters per line): > > /mltest: Resource temporarily unavailable > PID START END PRT RES PRES REF SHD FL TP > PATH > 1064 0x400000 0x401000 r-x 1 0 1 0 CN vn > /root/mlockall/mltest > 1064 0x500000 0x501000 rw- 1 0 1 0 CN df > 1064 0x501000 0x600000 rwx 255 0 1 0 -- df > 1064 0x800500000 0x80052c000 r-x 44 0 64 31 CN vn > /libexec/ld-elf.so.1 > 1064 0x80052c000 0x800534000 rw- 8 0 1 0 C- df > 1064 0x80062b000 0x800633000 rw- 8 0 1 0 CN vn > /libexec/ld-elf.so.1 > 1064 0x800633000 0x80063f000 rw- 12 0 1 0 C- df > 1064 0x80063f000 0x80072e000 r-x 239 0 128 62 CN vn > /lib/libc.so.7 > 1064 0x80072e000 0x80072f000 r-x 1 0 1 0 CN vn > /lib/libc.so.7 > 1064 0x80072f000 0x80082f000 r-x 51 0 128 62 CN vn > /lib/libc.so.7 > 1064 0x80082f000 0x80084f000 rw- 32 0 1 0 C- vn > /lib/libc.so.7 > 1064 0x80084f000 0x800865000 rw- 6 0 1 0 CN df > 1064 0x800900000 0x800965000 rw- 101 0 1 0 -- df > 1064 0x800965000 0x800a00000 rw- 155 0 1 0 -- df > 1064 0x7ffffffe0000 0x800000000000 rwx 3 0 1 0 C- df > > I have hunted down the exact location in the kernel where the call to > mlockall() returns an error (just using printf's, debugging using > Firewire proved not to be as trivial to set up as it was just a few > years ago). It appears that while wiring the memory, finally vm_fault() > is called and it bails out at line 412 of vm_fault.c. The virtual > address of the page that the system is attempting to wire (argument > vaddr of vm_fault()) is 0x800762000. From the procstat output above it > appears that this in the third region backed by /lib/libc.so.7. > > This made me think that the issue might be somehow related to the way in > which dynamic libraries are linked on runtime. Indeed, if above program > is linked -statically- it does not fail. Also if the program in compiled > and linked -dynamically- on a i386 platform and run on an amd64, it runs > successfully. > > To make a long story at least a bit shorter, I found that the problem is > in /usr/src/libexec/rtld_elf/map_object.c at line 156. Here a contiguous > region is staked out for the code and data. For the amd64, where the > required alignment of the segments is 1 Mbytes, this causes a region to > be mapped that is far larger than the library file by which it is > backed. Addresses that are not backed by the file cannot be resident and > hence the region cannot be locked into memory. On the i386 architecture > this problem does not occur since the alignment of the segments is just > 4 Kbytes. I suspect that the problem also occurs at least on the sparc64 > architecture. > > As a first step to a possible solution you can apply the attached > (provisional) patch, that uses an anonymous, read-only mapping to create > the required region. > > The output of the above program then becomes: > > PID START END PRT RES PRES REF SHD FL TP > PATH > 1302 0x400000 0x401000 r-x 1 0 1 0 CN vn > /root/mlockall/mltest > 1302 0x500000 0x501000 rw- 1 0 1 0 -- df > 1302 0x800500000 0x80052c000 r-x 44 0 8 4 CN vn > /libexec/ld-elf.so.1 > 1302 0x80052c000 0x800534000 rw- 8 0 1 0 -- df > 1302 0x80062b000 0x800633000 rw- 8 0 1 0 C- vn > /libexec/ld-elf.so.1 > 1302 0x800633000 0x80063f000 rw- 12 0 1 0 -- df > 1302 0x80063f000 0x80072e000 r-x 239 0 124 62 CN vn > /lib/libc.so.7 > 1302 0x80072e000 0x80072f000 r-x 1 0 1 0 C- vn > /lib/libc.so.7 > 1302 0x80072f000 0x80082f000 r-- 256 0 1 0 -- df > 1302 0x80082f000 0x80084f000 rw- 32 0 1 0 C- vn > /lib/libc.so.7 > 1302 0x80084f000 0x800865000 rw- 22 0 1 0 -- df > 1302 0x7ffffffe0000 0x800000000000 rwx 32 0 1 0 -- df > > i.e. mlockall() does not return an error anymore. > > I still have the following questions: > > 1. Is worth the trouble to solve the mlockall() problem at all ? Should > I file a PR ? Yes. Do as you want, but I see no reason.
Your analisys looks correct and useful. > > 2. Can someone confirm that it also occurs on the other 64 bit > architectures ? > > 3. It might be more elegant to use PROT_NONE instead of PROT_READ when > just staking out the address space. Currently mlockall() returns an > error when attempting that, so most likely mlockall() would need to be > changed to ignore regions mapped with PROT_NONE. On the other hand, the > pthread implementation uses PROT_NONE to create red zones on the stack > and mlockall() apparently succeeds with threaded applications (using the > provided patch). Any opinions/ideas/hints ? I think that it is better to unmap the holes, instead of making some mapping. Please, try this patch instead. diff --git a/libexec/rtld-elf/map_object.c b/libexec/rtld-elf/map_object.c index 2d06074..3266af0 100644 --- a/libexec/rtld-elf/map_object.c +++ b/libexec/rtld-elf/map_object.c @@ -83,6 +83,7 @@ map_object(int fd, const char *path, const struct stat *sb) Elf_Addr bss_vaddr; Elf_Addr bss_vlimit; caddr_t bss_addr; + size_t hole; hdr = get_elf_header(fd, path); if (hdr == NULL) @@ -91,8 +92,7 @@ map_object(int fd, const char *path, const struct stat *sb) /* * Scan the program header entries, and save key information. * - * We rely on there being exactly two load segments, text and data, - * in that order. + * We expect that the loadable segments are ordered by load address. */ phdr = (Elf_Phdr *) ((char *)hdr + hdr->e_phoff); phsize = hdr->e_phnum * sizeof (phdr[0]); @@ -214,6 +214,17 @@ map_object(int fd, const char *path, const struct stat *sb) return NULL; } } + + /* Unmap the region between two non-adjusted ELF segments */ + if (i < nsegs) { + hole = trunc_page(segs[i + 1]->p_vaddr) - bss_vlimit; + if (hole > 0 && munmap(mapbase + bss_vlimit, hole) == -1) { + _rtld_error("%s: munmap hole failed: %s", path, + strerror(errno)); + return NULL; + } + } + if (phdr_vaddr == 0 && data_offset <= hdr->e_phoff && (data_vlimit - data_vaddr + data_offset) >= (hdr->e_phoff + hdr->e_phnum * sizeof (Elf_Phdr))) {
pgpPKyIOTRcyd.pgp
Description: PGP signature