> -----Original Message----- > From: Burakov, Anatoly > Sent: Friday, June 1, 2018 1:00 PM > On 01-Jun-18 1:51 PM, Dariusz Stojaczyk wrote: > > EAL reserves a huge area in virtual address space to provide virtual > > address contiguity for e.g. > > future memory extensions (memory hotplug). During memory hotplug, if > > the hugepage mmap succeeds but doesn't suffice EAL's requiriments, the > > EAL would unmap this mapping straight away, leaving a hole in its > > virtual memory area and making it available to everyone. As EAL still > > thinks it owns the entire region, it may try to mmap it later with > > MAP_FIXED, possibly overriding a user's mapping that was made in the > > meantime. > > > > This patch ensures each hole is mapped back by EAL, so that it won't > > be available to anyone else. > > > > Signed-off-by: Dariusz Stojaczyk <dariuszx.stojac...@intel.com> > > --- > > Generally, if the commit is a bugfix, reference to the original commit that > introduces the issue should be part of the commit message. See DPDK > contribution guidelines [1] and "git fixline" [2]. Also, this bug is present > in > 18.05, so you should also Cc: sta...@dpdk.org (same applies for all of your > other patches for memalloc). > > [1] http://dpdk.org/doc/guides/contributing/patches.html > [2] http://dpdk.org/dev
Ack, thanks. > > > lib/librte_eal/linuxapp/eal/eal_memalloc.c | 17 +++++++++++++++++ > > 1 file changed, 17 insertions(+) > > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c > > b/lib/librte_eal/linuxapp/eal/eal_memalloc.c > > index 8c11f98..b959ea5 100644 > > --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c > > +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c > > @@ -39,6 +39,7 @@ > > #include "eal_filesystem.h" > > #include "eal_internal_cfg.h" > > #include "eal_memalloc.h" > > +#include "eal_private.h" > > > > /* > > * not all kernel version support fallocate on hugetlbfs, so fall > > back to @@ -490,6 +491,8 @@ alloc_seg(struct rte_memseg *ms, void > *addr, int socket_id, > > int ret = 0; > > int fd; > > size_t alloc_sz; > > + int flags; > > + void *new_addr; > > > > /* takes out a read lock on segment or segment list */ > > fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx); @@ > > -585,6 +588,20 @@ alloc_seg(struct rte_memseg *ms, void *addr, int > > socket_id, > > > > mapped: > > munmap(addr, alloc_sz); > > + flags = MAP_FIXED; > > +#ifdef RTE_ARCH_PPC_64 > > + flags |= MAP_HUGETLB; > > +#endif > > + new_addr = eal_get_virtual_area(addr, &alloc_sz, alloc_sz, 0, > > +flags); > > Page size shouldn't be zero, should be alloc_sz. The 0 above is for the `flags` parameter. Page size is set to alloc_sz. ``` void * eal_get_virtual_area(void *requested_addr, size_t *size, size_t page_sz, int flags, int mmap_flags); ``` I believe it's okay. Correct me if I'm wrong. > > > + if (new_addr != addr) { > > + if (new_addr != NULL) { > > + munmap(new_addr, alloc_sz); > > + } > > + /* We're leaving a hole in our virtual address space. If > > + * somebody else maps this hole now, we might accidentally > > + * override it in the future. */ > > + rte_panic("can't mmap holes in our virtual address space"); > > I don't think rte_panic() here makes sense. First of all, we're now striving > to > not panic inside DPDK libraries, especially once initialization has already > finished. But more importantly, when you reach this panic, you're deep in > the memory allocation process, which means you hold a write lock to DPDK > memory hotplug. If you crash now, the lock will never be released and > subsequent memory hotplug requests will just deadlock. > > What's worse is you may be inside a secondary process, synchronizing the > memory map with that of a primary, which means that even if you release > the lock here, you're basically releasing someone else's lock, so behavior > will be undefined at that point. > > I think an error message with the highest possible log level should suffice > (CRIT?), as there's really nothing more we can do here. Ok, I'll fallback to CRIT log for now. We could try to add some proper error handling in a separate patch later. > > That said, how likely is this scenario? I can't think of any reason why that last mmap would fail, but it's still technically possible. > > > > + } > > resized: > > if (internal_config.single_file_segments) { > > resize_hugefile(fd, path, list_idx, seg_idx, map_offset, > > > > Generally, if the above is fixed, i'm OK with the patch. > > -- > Thanks, > Anatoly D.