If any of you were following the thread at https://www.postgresql.org/message-id/flat/CAOan6TnQeSGcu_627NXQ2Z%2BWyhUzBjhERBm5RN9D0QFWmk7PoQ%40mail.gmail.com I spent quite a bit of time following a bogus theory, but the problem turns out to be very simple: on Linux, munmap() is pickier than mmap() about the length of a hugepage allocation. The comments in sysv_shmem.c mention that on older kernels mmap() with MAP_HUGETLB will fail if given a length request that's not a multiple of the hugepage size. Well, the behavior they replaced that with is little better: mmap() succeeds, but it gives you back a region that's been silently enlarged to the next hugepage boundary, and then munmap() will fail if you specify the region size you asked for rather than the region size you were given.
Since AFAICS there is no way to inquire what region size you were given, this API is astonishingly brain-dead IMO. But that seems to be what we've got. Chris Richards reported it against a 3.16.7 kernel, and I can replicate the behavior on RHEL6 (2.6.32) by asking for an odd-size huge page region. We've mostly masked this by rounding up to a 2MB boundary, which is what the hugepage size typically is. But that assumption is wrong on some hardware, and it's not likely to get less wrong as time passes. A little bit of research suggests that on Linux the thing to do would be to get the actual default hugepage size by reading /proc/meminfo and looking for a line like "Hugepagesize: 2048 kB". I don't know of any more-portable API, so this does nothing for non-Linux kernels. But we have not heard of similar misbehavior on other platforms, even though IA64 and PPC64 can both have hugepages larger than 2MB, so it's reasonable to hope that other implementations of munmap() don't have the same gotcha. Barring objections I'll go make this happen. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers