On Wed, Jun 21, 2023 at 10:42 AM Andres Freund <and...@anarazel.de> wrote:
> So I am wondering if you're encountering a different kind of problem. As I > mentioned, I have observed that the pages need to be clean for this to > work. For me adding a "sync path/to/postgres" makes it work on 6.3.8. Without > the sync it starts to work a while later (presumably when the kernel got > around to writing the data back). Hmm, then after rebooting today, it shouldn't have that problem until a build links again, but I'll make sure to do that when building. Still same failure, though. Looking more closely at the manpage for madvise, it has this under MADV_HUGEPAGE: "The MADV_HUGEPAGE, MADV_NOHUGEPAGE, and MADV_COLLAPSE operations are available only if the kernel was configured with CONFIG_TRANSPARENT_HUGEPAGE and file/shmem memory is only supported if the kernel was configured with CONFIG_READ_ONLY_THP_FOR_FS." Earlier, I only checked the first config option but didn't know about the second... $ grep CONFIG_READ_ONLY_THP_FOR_FS /boot/config-$(uname -r) # CONFIG_READ_ONLY_THP_FOR_FS is not set Apparently, it's experimental. That could be the explanation, but now I'm wondering why the fallback madvise(addr, advlen, MADV_HUGEPAGE); didn't also give an error. I wonder if we could mremap to some anonymous region and call madvise on that. That would be more similar to the hack I shared last year, which may be more fragile, but now it wouldn't need explicit huge pages. -- John Naylor EDB: http://www.enterprisedb.com