On Thu, Nov 29, 2018 at 01:48:34PM +0000, Anatoly Burakov wrote: > The general use-case of using external memory is well covered by > existing external memory API's. However, certain use cases require > manual management of externally allocated memory areas, so this > memory should not be added to the heap. It should, however, be > added to DPDK's internal structures, so that API's like > ``rte_virt2memseg`` would work on such external memory segments. > > This commit adds such an API to DPDK. The new functions will allow > to register and unregister externally allocated memory areas, as > well as documentation for them. > > Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com> > --- > .../prog_guide/env_abstraction_layer.rst | 60 ++++++++++++--- > lib/librte_eal/common/eal_common_memory.c | 74 +++++++++++++++++++ > lib/librte_eal/common/include/rte_memory.h | 63 ++++++++++++++++ > lib/librte_eal/rte_eal_version.map | 2 + > 4 files changed, 189 insertions(+), 10 deletions(-) > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst > b/doc/guides/prog_guide/env_abstraction_layer.rst > index 8b5d050c7..d7799b626 100644 > --- a/doc/guides/prog_guide/env_abstraction_layer.rst > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst > @@ -212,17 +212,26 @@ Normally, these options do not need to be changed. > Support for Externally Allocated Memory > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > -It is possible to use externally allocated memory in DPDK, using a set of > malloc > -heap API's. Support for externally allocated memory is implemented through > -overloading the socket ID - externally allocated heaps will have socket ID's > -that would be considered invalid under normal circumstances. Requesting an > -allocation to take place from a specified externally allocated memory is a > -matter of supplying the correct socket ID to DPDK allocator, either directly > -(e.g. through a call to ``rte_malloc``) or indirectly (through data > -structure-specific allocation API's such as ``rte_ring_create``). > +It is possible to use externally allocated memory in DPDK. There are two > ways in > +which using externally allocated memory can work: the malloc heap API's, and > +manual memory management. > > -Since there is no way DPDK can verify whether memory are is available or > valid, > -this responsibility falls on the shoulders of the user. All multiprocess > ++ Using heap API's for externally allocated memory > + > +Using using a set of malloc heap API's is the recommended way to use > externally > +allocated memory in DPDK. In this way, support for externally allocated > memory > +is implemented through overloading the socket ID - externally allocated heaps > +will have socket ID's that would be considered invalid under normal > +circumstances. Requesting an allocation to take place from a specified > +externally allocated memory is a matter of supplying the correct socket ID to > +DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or > +indirectly (through data structure-specific allocation API's such as > +``rte_ring_create``). Using these API's also ensures that mapping of > externally > +allocated memory for DMA is also performed on any memory segment that is > added > +to a DPDK malloc heap. > + > +Since there is no way DPDK can verify whether memory is available or valid, > this > +responsibility falls on the shoulders of the user. All multiprocess > synchronization is also user's responsibility, as well as ensuring that all > calls to add/attach/detach/remove memory are done in the correct order. It is > not required to attach to a memory area in all processes - only attach to > memory > @@ -246,6 +255,37 @@ The expected workflow is as follows: > For more information, please refer to ``rte_malloc`` API documentation, > specifically the ``rte_malloc_heap_*`` family of function calls. > > ++ Using externally allocated memory without DPDK API's > + > +While using heap API's is the recommended method of using externally > allocated > +memory in DPDK, there are certain use cases where the overhead of DPDK heap > API > +is undesirable - for example, when manual memory management is performed on > an > +externally allocated area. To support use cases where externally allocated > +memory will not be used as part of normal DPDK workflow, there is also > another > +set of API's under the ``rte_extmem_*`` namespace. > + > +These API's are (as their name implies) intended to allow registering or > +unregistering externally allocated memory to/from DPDK's internal page > table, to > +allow API's like ``rte_virt2memseg`` etc. to work with externally allocated > +memory. Memory added this way will not be available for any regular DPDK > +allocators; DPDK will leave this memory for the user application to manage. > + > +The expected workflow is as follows: > + > +* Get a pointer to memory area > +* Register memory within DPDK > + - If IOVA table is not specified, IOVA addresses will be assumed to be > + unavailable > +* Perform DMA mapping with ``rte_vfio_dma_map`` if needed > +* Use the memory area in your application > +* If memory area is no longer needed, it can be unregistered > + - If the area was mapped for DMA, unmapping must be performed before > + unregistering memory > + > +Since these externally allocated memory areas will not be managed by DPDK, > it is > +therefore up to the user application to decide how to use them and what to do > +with them once they're registered. > + > Per-lcore and Shared Variables > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > diff --git a/lib/librte_eal/common/eal_common_memory.c > b/lib/librte_eal/common/eal_common_memory.c > index d47ea4938..a2e085ae8 100644 > --- a/lib/librte_eal/common/eal_common_memory.c > +++ b/lib/librte_eal/common/eal_common_memory.c > @@ -24,6 +24,7 @@ > #include "eal_memalloc.h" > #include "eal_private.h" > #include "eal_internal_cfg.h" > +#include "malloc_heap.h" > > /* > * Try to mmap *size bytes in /dev/zero. If it is successful, return the > @@ -775,6 +776,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, > size_t *offset) > return ret; > } > > +int __rte_experimental > +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[], > + unsigned int n_pages, size_t page_sz) > +{ > + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; > + unsigned int socket_id; > + int ret = 0; > + > + if (va_addr == NULL || page_sz == 0 || len == 0 || > + !rte_is_power_of_2(page_sz) || > + RTE_ALIGN(len, page_sz) != len) { > + rte_errno = EINVAL; > + return -1; > + }
Isn't it better to have more sanity check? E.g, (len / page_sz == n_pages) like rte_malloc_heap_memory_add(). And what about the alignment of va_addr? Shouldn't it be page-aligned if I'm not mistaken? rte_malloc_heap_memory_add() doesn't have it either... Also you might want to add it to documentation that granularity of these registrations is a page. Otherwise, Acked-by: Yongseok Koh <ys...@mellanox.com> Thanks > + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); > + > + /* make sure the segment doesn't already exist */ > + if (malloc_heap_find_external_seg(va_addr, len) != NULL) { > + rte_errno = EEXIST; > + ret = -1; > + goto unlock; > + } > + > + /* get next available socket ID */ > + socket_id = mcfg->next_socket_id; > + if (socket_id > INT32_MAX) { > + RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n"); > + rte_errno = ENOSPC; > + ret = -1; > + goto unlock; > + } > + > + /* we can create a new memseg */ > + if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages, > + page_sz, "extmem", socket_id) == NULL) { > + ret = -1; > + goto unlock; > + } > + > + /* memseg list successfully created - increment next socket ID */ > + mcfg->next_socket_id++; > +unlock: > + rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock); > + return ret; > +} > + > +int __rte_experimental > +rte_extmem_unregister(void *va_addr, size_t len) > +{ > + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; > + struct rte_memseg_list *msl; > + int ret = 0; > + > + if (va_addr == NULL || len == 0) { > + rte_errno = EINVAL; > + return -1; > + } > + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); > + > + /* find our segment */ > + msl = malloc_heap_find_external_seg(va_addr, len); > + if (msl == NULL) { > + rte_errno = ENOENT; > + ret = -1; > + goto unlock; > + } > + > + ret = malloc_heap_destroy_external_seg(msl); > +unlock: > + rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock); > + return ret; > +} > + > /* init memory subsystem */ > int > rte_eal_memory_init(void) > diff --git a/lib/librte_eal/common/include/rte_memory.h > b/lib/librte_eal/common/include/rte_memory.h > index d970825df..4a43c1a9e 100644 > --- a/lib/librte_eal/common/include/rte_memory.h > +++ b/lib/librte_eal/common/include/rte_memory.h > @@ -423,6 +423,69 @@ int __rte_experimental > rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms, > size_t *offset); > > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Register external memory chunk with DPDK. > + * > + * @note Using this API is mutually exclusive with ``rte_malloc`` family of > + * API's. > + * > + * @note This API will not perform any DMA mapping. It is expected that user > + * will do that themselves. > + * > + * @param va_addr > + * Start of virtual area to register > + * @param len > + * Length of virtual area to register > + * @param iova_addrs > + * Array of page IOVA addresses corresponding to each page in this memory > + * area. Can be NULL, in which case page IOVA addresses will be set to > + * RTE_BAD_IOVA. > + * @param n_pages > + * Number of elements in the iova_addrs array. Ignored if ``iova_addrs`` > + * is NULL. > + * @param page_sz > + * Page size of the underlying memory > + * > + * @return > + * - 0 on success > + * - -1 in case of error, with rte_errno set to one of the following: > + * EINVAL - one of the parameters was invalid > + * EEXIST - memory chunk is already registered > + * ENOSPC - no more space in internal config to store a new memory chunk > + */ > +int __rte_experimental > +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[], > + unsigned int n_pages, size_t page_sz); > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Unregister external memory chunk with DPDK. > + * > + * @note Using this API is mutually exclusive with ``rte_malloc`` family of > + * API's. > + * > + * @note This API will not perform any DMA unmapping. It is expected that > user > + * will do that themselves. > + * > + * @param va_addr > + * Start of virtual area to unregister > + * @param len > + * Length of virtual area to unregister > + * > + * @return > + * - 0 on success > + * - -1 in case of error, with rte_errno set to one of the following: > + * EINVAL - one of the parameters was invalid > + * ENOENT - memory chunk was not found > + */ > +int __rte_experimental > +rte_extmem_unregister(void *va_addr, size_t len); > + > /** > * Dump the physical memory layout to a file. > * > diff --git a/lib/librte_eal/rte_eal_version.map > b/lib/librte_eal/rte_eal_version.map > index 3fe78260d..593691a14 100644 > --- a/lib/librte_eal/rte_eal_version.map > +++ b/lib/librte_eal/rte_eal_version.map > @@ -296,6 +296,8 @@ EXPERIMENTAL { > rte_devargs_remove; > rte_devargs_type_count; > rte_eal_cleanup; > + rte_extmem_register; > + rte_extmem_unregister; > rte_fbarray_attach; > rte_fbarray_destroy; > rte_fbarray_detach; > -- > 2.17.1