Hi Bruno, On 01/07/11 21:17, Bruno Haible wrote: > Hi Bruce, > >> I now believe it completely correct to add this: free(malloc(0x88)) >> to the program and put it into the main line until the real cause >> (glibc or kernel) is determined and fixed. > > What is your explanation of why that free(malloc(0x88)) has the effect > of avoiding the crash?
It causes an initial allocation arena to be allocated. This allocation arena (if I read the code correctly) should be about 1MB in size, not 10MB. > If I understood things correctly from your Jakub Jelinek's reply to your > report <http://sourceware.org/bugzilla/show_bug.cgi?id=12232>, then the > effect of free(malloc(0x88)) is that is pre-allocates some memory pages, The "allocation arena" I mentioned above. > to such an extent that the mallocs inside rpl_fprintf or rpl_dprintf > succeed. We don't want this, as it only masks a problem that is still > present inside rpl_fprintf or rpl_dprintf. > > Ulrich and Jakub pointed you to the fact that it's the kernel who decides. > Have you tracked down in the kernel the source code that refuses memory > allocations, depending on the RLIMIT_AS value? At that moment when malloc > fails, what are the memory maps (/proc/<pid>/maps) The kernel decides, but their code determines what sizes of data to ask for. If they ask for too much, the kernel is reasonable. If they are not, then kernel behavior is at fault. I replaced "return 1" with "abort()": $ size core text data bss dec hex filename 65536 225280 0 290816 47000 core \ (core file invoked as ./test-dprintf-posix2 1) That is a bit smaller than 10000000 decimal. > and what was the system > call (strace!) that the malloc() call translated into? "ltrace -S" does both. I left off the "-S". > And what is the size of the 'test-dprintf-posix2' program with all its > dependencies (as shown by 'ldd')? Does it sum up to more than 10 MB? Not hardly. I'd have bumped the limit long ago if that were an issue. The tests are attempting to check for a memory leak, the size limit is arbitrary (i.e. not important, except that it must be large enough so that the program can get started......) $ for f in test-fprintf-posix3 test-dprintf-posix2;do echo $f;ldd $f;done test-fprintf-posix3 linux-vdso.so.1 => (0x00007fffa6fff000) librt.so.1 => /lib64/librt.so.1 (0x00007f41de2b7000) libm.so.6 => /lib64/libm.so.6 (0x00007f41de060000) libc.so.6 => /lib64/libc.so.6 (0x00007f41ddd00000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f41ddae3000) /lib64/ld-linux-x86-64.so.2 (0x00007f41de4c0000) test-dprintf-posix2 linux-vdso.so.1 => (0x00007fff995f9000) librt.so.1 => /lib64/librt.so.1 (0x00007f9b1dc99000) libm.so.6 => /lib64/libm.so.6 (0x00007f9b1da42000) libc.so.6 => /lib64/libc.so.6 (0x00007f9b1d6e2000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f9b1d4c5000) /lib64/ld-linux-x86-64.so.2 (0x00007f9b1dea2000) $ size test-fprintf-posix3 test-dprintf-posix2 text data bss dec hex filename 15338 648 24 16010 3e8a test-fprintf-posix3 15540 648 16 16204 3f4c test-dprintf-posix2 (statically linked to libposix) .............. __libc_start_main(0x4009c0, 2, 0x7fff27c151e8, 0x4036d0, 0x403760 <unfinished ...> getrlimit(2, 0x7fff27c150e0, 0x7fff27c15200, 0x7f2965bfe4a8, 0x7f2965bff320 <unfinished ...> SYS_getrlimit(2, 0x7fff27c150e0) = 0 <... getrlimit resumed> ) = 0 setrlimit(2, 0x7fff27c150e0, 0x7fff27c15200, -1, 0x7f2965bff320 <unfinished ...> SYS_setrlimit(2, 0x7fff27c150e0) = 0 <... setrlimit resumed> ) = 0 getrlimit(9, 0x7fff27c150e0, 0x7fff27c15200, -1, 0x7f2965bff320 <unfinished ...> SYS_getrlimit(9, 0x7fff27c150e0) = 0 <... getrlimit resumed> ) = 0 setrlimit(9, 0x7fff27c150e0, 0x7fff27c15200, -1, 0x7f2965bff320 <unfinished ...> SYS_setrlimit(9, 0x7fff27c150e0) = 0 <... setrlimit resumed> ) = 0 strtol(0x7fff27c17263, 0, 10, -1, 0x7f2965bff320) = 1 malloc(88 <unfinished ...> SYS_brk(NULL) = 0x00606000 SYS_brk(0x00627000) = 0x00606000 SYS_mmap(0, 0x100000, 3, 34, 0xffffffff) = -12 SYS_mmap(0, 0x8000000, 0, 16418, 0xffffffff) = -12 SYS_mmap(0, 0x4000000, 0, 16418, 0xffffffff) = -12 SYS_mmap(0, 0x8000000, 0, 16418, 0xffffffff) = -12 SYS_mmap(0, 0x4000000, 0, 16418, 0xffffffff) = -12 <... malloc resumed> ) = NULL __errno_location() = 0x7f29662506a8 __errno_location() = 0x7f29662506a8 SYS_exit_group(1 <no return ...> RE: SYS_mmap() I have no idea what "-12" means. It doesn't mean "a-ok" and it isn't "-1". ltrace does not seem to recognize it as an error result, so it isn't printing the location of the error code (which wouldn't help anyway). Anyhow, I *think* the first mmap ought to succeed because the process size is about 300K and it is only asking for 1M more. That fits my recollection of the malloc code. The remaining mmap calls are rather over the top and I'd expect them to be rejected. I would not expect malloc to try to allocate so much space. But maybe there is special magic there since the request is for "no access". It is mapped PRIVATE/ANONYMOUS like the first map. I don't know what the 0x4000 protection bit means. SUMMARY: there is brokenness somewhere between glibc and the kernel. >From the perspective of gnulib tests, it doesn't matter where the fault lies, it matters that there is a problem. What precise confluence of circumstances triggers the problem doesn't seem crucial to me, either, except to the glibc/kernel folks who do need to chase down the exact cause. Therefore, I think the test code should evade the problem rather than continuing to fail, leaving the fix to others. This, as a stand alone, not-linked-to-libposix, program does not fail (well, I've not seen it fail): #include <sys/types.h> #include <sys/time.h> #include <sys/resource.h> #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define NUM_ROUNDS 1000 #define MAX_ALLOC_ROUND 10000 #define MAX_ALLOC_TOTAL (NUM_ROUNDS * MAX_ALLOC_ROUND) int main (int argc, char ** argv) { struct rlimit limit; if (getrlimit (RLIMIT_DATA, &limit) < 0) return 77; if (limit.rlim_max == RLIM_INFINITY || limit.rlim_max > MAX_ALLOC_TOTAL) limit.rlim_max = MAX_ALLOC_TOTAL; limit.rlim_cur = limit.rlim_max; if (setrlimit (RLIMIT_DATA, &limit) < 0) return 77; if (getrlimit (RLIMIT_AS, &limit) < 0) return 77; if (limit.rlim_max == RLIM_INFINITY || limit.rlim_max > MAX_ALLOC_TOTAL) limit.rlim_max = MAX_ALLOC_TOTAL; limit.rlim_cur = limit.rlim_max; if (setrlimit (RLIMIT_AS, &limit) < 0) return 77; if (dprintf (STDOUT_FILENO, "%011000d\n", 17) == -1 && errno == ENOMEM) return 1; return 0; } __libc_start_main(0x400640, 1, 0x7fff280db718, 0x400730, 0x4007c0 <unfinished ...> getrlimit(2, 0x7fff280db610, 0x7fff280db728, 0x7fc9aa0784a8, 0x7fc9aa079320 <unfinished ...> SYS_getrlimit(2, 0x7fff280db610) = 0 <... getrlimit resumed> ) = 0 setrlimit(2, 0x7fff280db610, 0x7fff280db728, -1, 0x7fc9aa079320 <unfinished ...> SYS_setrlimit(2, 0x7fff280db610) = 0 <... setrlimit resumed> ) = 0 getrlimit(9, 0x7fff280db610, 0x7fff280db728, -1, 0x7fc9aa079320 <unfinished ...> SYS_getrlimit(9, 0x7fff280db610) = 0 <... getrlimit resumed> ) = 0 setrlimit(9, 0x7fff280db610, 0x7fff280db728, -1, 0x7fc9aa079320 <unfinished ...> SYS_setrlimit(9, 0x7fff280db610) = 0 <... setrlimit resumed> ) = 0 dprintf(1, 0x40081c, 17, -1, 0x7fc9aa079320 <unfinished ...> SYS_fstat(1, 0x7fff280db110) = 0 SYS_mmap(0, 4096, 3, 34, 0xffffffff) = 0x7fc9aa29a000 SYS_lseek(1, 0, 1) = -29 SYS_write(1, "00000000000000000000000000000000"..., 10240000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000) = 1024 [..................] SYS_munmap(0x7fc9aa29a000, 4096) = 0 <... dprintf resumed> ) = 11001 SYS_exit_group(0 <no return ...> +++ exited (status 0) +++