On Sun, 28 May 2023 23:07:40 +0300 Baruch Even <bar...@weka.io> wrote:
> Hi, > > We found an issue with newer kernels (5.13+) that are found on newer OSes > (Ubuntu22, Rocky9, Ubuntu20 with kernel 5.15) where a 2M page that was > allocated for DPDK was migrated (moved into another physical page) when a > 1G page was allocated. > > From our reading of the kernel commits this started with commit > ae37c7ff79f1f030e28ec76c46ee032f8fd07607 > mm: make alloc_contig_range handle in-use hugetlb pages > > This caused what looked like memory corruptions to us and cases where the > rings were moved from their physical location and communication was no > longer possible. > > I wanted to ask if anyone else hit this issue and what mitigations are > available? > > We are currently looking at using a kernel driver to pin the pages but I > expect that this issue will affect others and that a more general approach > is needed. > > Thanks, > Baruch > Fix might be as simple as asking kernel to lock the mmap(). diff --git a/lib/eal/linux/eal_hugepage_info.c b/lib/eal/linux/eal_hugepage_info.c index 581d9dfc91eb..989c69387233 100644 --- a/lib/eal/linux/eal_hugepage_info.c +++ b/lib/eal/linux/eal_hugepage_info.c @@ -48,7 +48,8 @@ map_shared_memory(const char *filename, const size_t mem_size, int flags) return NULL; } retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, - MAP_SHARED, fd, 0); + MAP_SHARED_VALIDATE | MAP_LOCKED, fd, 0); + close(fd); return retval == MAP_FAILED ? NULL : retval; }