14/04/2020 21:44, Dmitry Kozlyuk: > System meory management is implemented differently for POSIX and
meory -> memory > Windows. Introduce wrapper functions for operations used across DPDK: > > * rte_mem_map() > Create memory mapping for a regular file or a page file (swap). > This supports mapping to a reserved memory region even on Windows. > > * rte_mem_unmap() > Remove mapping created with rte_mem_map(). > > * rte_get_page_size() > Obtain default system page size. > > * rte_mem_lock() > Make arbitrary-sized memory region non-swappable. > > Wrappers follow POSIX semantics limited to DPDK tasks, but their > signatures deliberately differ from POSIX ones to be more safe and > expressive. > > Signed-off-by: Dmitry Kozlyuk <dmitry.kozl...@gmail.com> [...] > +/** > + * Memory reservation flags. > + */ > +enum eal_mem_reserve_flags { > + /**< Reserve hugepages (support may be limited or missing). */ > + EAL_RESERVE_HUGEPAGES = 1 << 0, > + /**< Fail if requested address is not available. */ > + EAL_RESERVE_EXACT_ADDRESS = 1 << 1 > +}; Maybe more context is needed to understand the meaning of these flags. [...] > -eal_get_virtual_area(void *requested_addr, size_t *size, > - size_t page_sz, int flags, int mmap_flags); > +eal_get_virtual_area(void *requested_addr, size_t *size, size_t page_sz, > + int flags, int mmap_flags); Is there any change here? [...] > + * If @code virt @endcode and @code size @endcode describe a part of the I am not sure about using @code. It makes reading from source harder. Is there a real benefit? [...] > +/** > + * Memory protection flags. > + */ > +enum rte_mem_prot { > + RTE_PROT_READ = 1 << 0, /**< Read access. */ > + RTE_PROT_WRITE = 1 << 1, /**< Write access. */ > + RTE_PROT_EXECUTE = 1 << 2 /**< Code execution. */ > +}; Alignment of comments would look better :-) > + > +/** > + * Memory mapping additional flags. > + * > + * In Linux and FreeBSD, each flag is semantically equivalent > + * to OS-specific mmap(3) flag with the same or similar name. > + * In Windows, POSIX and MAP_ANONYMOUS semantics are followed. > + */ I don't understand this comment. The flags and meanings are the same no matter the OS, right? > +enum rte_map_flags { > + /** Changes of mapped memory are visible to other processes. */ > + RTE_MAP_SHARED = 1 << 0, > + /** Mapping is not backed by a regular file. */ > + RTE_MAP_ANONYMOUS = 1 << 1, > + /** Copy-on-write mapping, changes are invisible to other processes. */ > + RTE_MAP_PRIVATE = 1 << 2, > + /** Fail if requested address cannot be taken. */ > + RTE_MAP_FIXED = 1 << 3 > +}; > + > +/** > + * OS-independent implementation of POSIX mmap(3) > + * with MAP_ANONYMOUS Linux/FreeBSD extension. > + */ > +__rte_experimental > +void *rte_mem_map(void *requested_addr, size_t size, enum rte_mem_prot prot, > + enum rte_map_flags flags, int fd, size_t offset); > + > +/** > + * OS-independent implementation of POSIX munmap(3). > + */ > +__rte_experimental > +int rte_mem_unmap(void *virt, size_t size); > + > +/** > + * Get system page size. This function never failes. failes -> fails > + * > + * @return > + * Positive page size in bytes. > + */ > +__rte_experimental > +int rte_get_page_size(void); > + > +/** > + * Lock region in physical memory and prevent it from swapping. > + * > + * @param virt > + * The virtual address. > + * @param size > + * Size of the region. > + * @return > + * 0 on success, negative on error. > + * > + * @note Implementations may require @p virt and @p size to be multiples > + * of system page size. > + * @see rte_get_page_size() > + * @see rte_mem_lock_page() > + */ > +__rte_experimental > +int rte_mem_lock(const void *virt, size_t size); [...] > --- /dev/null > +++ b/lib/librte_eal/unix/eal_memory.c License and copyright missing. > @@ -0,0 +1,113 @@ > +#include <string.h> > +#include <sys/mman.h> > +#include <unistd.h> > + > +#include <rte_errno.h> > +#include <rte_log.h> > +#include <rte_memory.h> > + > +#include "eal_private.h" > + > +static void * > +mem_map(void *requested_addr, size_t size, int prot, int flags, > + int fd, size_t offset) > +{ > + void *virt = mmap(requested_addr, size, prot, flags, fd, offset); > + if (virt == MAP_FAILED) { > + RTE_LOG(ERR, EAL, Not sure it should be a log level so high. We could imagine checking a memory map. What about INFO level? The real error log will be made by the caller. > + "Cannot mmap(%p, 0x%zx, 0x%x, 0x%x, %d, 0x%zx): %s\n", > + requested_addr, size, prot, flags, fd, offset, > + strerror(errno)); > + rte_errno = errno; > + return NULL; [...] > +void * > +eal_mem_reserve(void *requested_addr, size_t size, > + enum eal_mem_reserve_flags flags) > +{ > + int sys_flags = MAP_PRIVATE | MAP_ANONYMOUS; > + > +#ifdef MAP_HUGETLB > + if (flags & EAL_RESERVE_HUGEPAGES) > + sys_flags |= MAP_HUGETLB; > +#endif If MAP_HUGETLB is false, and flags contain EAL_RESERVE_HUGEPAGES, I think an error should be returned. > + if (flags & EAL_RESERVE_EXACT_ADDRESS) > + sys_flags |= MAP_FIXED; > + > + return mem_map(requested_addr, size, PROT_NONE, sys_flags, -1, 0); > +} [...] > +int > +rte_get_page_size(void) > +{ > + return getpagesize(); > +} > + > +int > +rte_mem_lock(const void *virt, size_t size) > +{ > + return mlock(virt, size); > +} Why don't you replace existing code with these new functions?