On 14-Apr-20 8:44 PM, Dmitry Kozlyuk wrote:
System meory management is implemented differently for POSIX and
Windows. Introduce wrapper functions for operations used across DPDK:
* rte_mem_map()
Create memory mapping for a regular file or a page file (swap).
This supports mapping to a reserved memory region even on Windows.
* rte_mem_unmap()
Remove mapping created with rte_mem_map().
* rte_get_page_size()
Obtain default system page size.
* rte_mem_lock()
Make arbitrary-sized memory region non-swappable.
Wrappers follow POSIX semantics limited to DPDK tasks, but their
signatures deliberately differ from POSIX ones to be more safe and
expressive.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozl...@gmail.com>
---
<snip>
+/**
+ * Memory reservation flags.
+ */
+enum eal_mem_reserve_flags {
+ /**< Reserve hugepages (support may be limited or missing). */
+ EAL_RESERVE_HUGEPAGES = 1 << 0,
+ /**< Fail if requested address is not available. */
+ EAL_RESERVE_EXACT_ADDRESS = 1 << 1
I *really* don't like this terminology.
In Linux et al., MAP_FIXED is not just "reserve at this exact address".
MAP_FIXED is actually fairly dangerous if you don't know what you're
doing, because it will unconditionally unmap any previously mapped
memory. Also, to my knowledge, a call to MAP_FIXED cannot fail unless
something went very wrong - it will *not* "fail if requested address is
not available". We basically use MAP_FIXED because we have already
mapped that area with MAP_ANONYMOUS previously, so we can guarantee that
it's safe to call MAP_FIXED.
I would greatly prefer if this was named to better reflect the above.
EAL_FORCE_RESERVE perhaps? The comment also needs to be adjusted.
+};
+
/**
* Get virtual area of specified size from the OS.
*
@@ -232,8 +243,8 @@ int rte_eal_check_module(const char *module_name);
#define EAL_VIRTUAL_AREA_UNMAP (1 << 2)
/**< immediately unmap reserved virtual area. */
void *
-eal_get_virtual_area(void *requested_addr, size_t *size,
- size_t page_sz, int flags, int mmap_flags);
+eal_get_virtual_area(void *requested_addr, size_t *size, size_t page_sz,
+ int flags, int mmap_flags);
/**
<snip>
+/**
+ * Reserve a region of virtual memory.
+ *
+ * Use eal_mem_free() to free reserved memory.
+ *
+ * @param requested_addr
+ * A desired reservation address. The system may not respect it.
+ * NULL means the address will be chosen by the system.
+ * @param size
+ * Reservation size. Must be a multiple of system page size.
+ * @param flags
+ * Reservation options.
+ * @returns
+ * Starting address of the reserved area on success, NULL on failure.
+ * Callers must not access this memory until remapping it.
+ */
+void *eal_mem_reserve(void *requested_addr, size_t size,
+ enum eal_mem_reserve_flags flags);
This seems fairly suspect to me. I know that technically enum is an int,
but semantically, IIRC an enum value should always contain exactly one
value - you can't use an enum value like a set of flags.
+
+/**
+ * Free memory obtained by eal_mem_reserve() or eal_mem_alloc().
+ *
+ * If @code virt @endcode and @code size @endcode describe a part of the
+ * reserved region, only this part of the region is freed (accurately
+ * up to the system page size). If @code virt @endcode points to allocated
+ * memory, @code size @endcode must match the one specified on allocation.
+ * The behavior is undefined if the memory pointed by @code virt @endcode
+ * is obtained from another source than listed above.
+ *
+ * @param virt
<snip>
+/**
+ * Memory mapping additional flags.
+ *
+ * In Linux and FreeBSD, each flag is semantically equivalent
+ * to OS-specific mmap(3) flag with the same or similar name.
+ * In Windows, POSIX and MAP_ANONYMOUS semantics are followed.
+ */
+enum rte_map_flags {
+ /** Changes of mapped memory are visible to other processes. */
+ RTE_MAP_SHARED = 1 << 0,
+ /** Mapping is not backed by a regular file. */
+ RTE_MAP_ANONYMOUS = 1 << 1,
+ /** Copy-on-write mapping, changes are invisible to other processes. */
+ RTE_MAP_PRIVATE = 1 << 2,
+ /** Fail if requested address cannot be taken. */
+ RTE_MAP_FIXED = 1 << 3
Again, MAP_FIXED does not behave the way you describe. See above comments.
+};
+
+/**
+ * OS-independent implementation of POSIX mmap(3)
+ * with MAP_ANONYMOUS Linux/FreeBSD extension.
+ */
+__rte_experimental
+void *rte_mem_map(void *requested_addr, size_t size, enum rte_mem_prot prot,
+ enum rte_map_flags flags, int fd, size_t offset);
+
+/**
+ * OS-independent implementation of POSIX munmap(3).
+ */
+__rte_experimental
+int rte_mem_unmap(void *virt, size_t size);
+
+/**
+ * Get system page size. This function never failes.
+ *
+ * @return
+ * Positive page size in bytes.
+ */
+__rte_experimental
+int rte_get_page_size(void);
uint32_t? or maybe uint64_t?
+
+/**
+ * Lock region in physical memory and prevent it from swapping.
+ *
+ * @param virt
+ * The virtual address.
+ * @param size
+ * Size of the region.
+ * @return
+ * 0 on success, negative on error.
+ *
+ * @note Implementations may require @p virt and @p size to be multiples
+ * of system page size.
+ * @see rte_get_page_size()
+ * @see rte_mem_lock_page()
+ */
+__rte_experimental
+int rte_mem_lock(const void *virt, size_t size);
+
/**
--
Thanks,
Anatoly