Thanks Anatoly & David. The main patch provided by Anatoly is not compatible with DPDK 19.02. So I had to make following patch and verified it to be working with DPDK 19.02 & SPDK 19.07 combination.
------------------------------------------------------------------------------------------------------------------------------------------------------------------ diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h index 84aabe3..49934f5 100644 --- a/lib/librte_eal/common/include/rte_eal_memconfig.h +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h @@ -35,6 +35,7 @@ struct rte_memseg_list { volatile uint32_t version; /**< version number for multiprocess sync. */ size_t len; /**< Length of memory area covered by this memseg list. */ unsigned int external; /**< 1 if this list points to external memory */ + unsigned int heap; /**< 1 if this list points to a heap */ struct rte_fbarray memseg_arr; }; diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c index b39de3c..4d31bf7 100644 --- a/lib/librte_eal/common/rte_malloc.c +++ b/lib/librte_eal/common/rte_malloc.c @@ -368,6 +368,7 @@ void rte_free(void *addr) rte_spinlock_lock(&heap->lock); ret = malloc_heap_add_external_memory(heap, msl); + msl->heap = 1; /* mark it as heap segment */ rte_spinlock_unlock(&heap->lock); unlock: diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 1b96b57..33dd923 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -836,6 +836,7 @@ void numa_error(char *where) msl->page_sz = page_sz; msl->socket_id = socket_id; msl->base_va = NULL; + msl->heap = 1; /* mark it as a heap segment */ RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n", (size_t)page_sz >> 10, socket_id); @@ -1410,6 +1411,7 @@ void numa_error(char *where) msl->page_sz = page_sz; msl->socket_id = 0; msl->len = internal_config.memory; + msl->heap = 1; /* we're in single-file segments mode, so only the segment list * fd needs to be set up. @@ -1682,6 +1684,7 @@ void numa_error(char *where) mem_sz = msl->len; munmap(msl->base_va, mem_sz); msl->base_va = NULL; + msl->heap = 0; /* destroy backing fbarray */ rte_fbarray_destroy(&msl->memseg_arr); diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c index 61b54f9..3bb3e6b 100644 --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c @@ -1238,7 +1238,16 @@ static int vfio_dma_mem_map(struct vfio_config *vfio_cfg, uint64_t vaddr, { int *vfio_container_fd = arg; - if (msl->external & !internal_config.iso_cmem) + /* skip external memory that isn't a heap */ + if (msl->external && !msl->heap) + return 0; + + /* skip any segments with invalid IOVA addresses */ + if (ms->iova == RTE_BAD_IOVA) + return 0; + + /* if IOVA mode is VA, we've already mapped the internal segments */ + if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA) return 0; return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova, @@ -1362,9 +1371,19 @@ static int vfio_dma_mem_map(struct vfio_config *vfio_cfg, uint64_t vaddr, { int *vfio_container_fd = arg; - if (msl->external) + /* skip external memory that isn't a heap */ + if (msl->external && !msl->heap) return 0; + /* skip any segments with invalid IOVA addresses */ + if (ms->iova == RTE_BAD_IOVA) + return 0; + +#if 0 + if (ms->addr_64 == param->addr_64) + return 0; +#endif + return vfio_spapr_dma_do_map(*vfio_container_fd, ms->addr_64, ms->iova, ms->len, 1); } @@ -1381,6 +1400,12 @@ struct spapr_walk_param { uint64_t max = ms->iova + ms->len; if (msl->external) + /* skip external memory that isn't a heap */ + if (msl->external && !msl->heap) + return 0; + + /* skip any segments with invalid IOVA addresses */ + if (ms->iova == RTE_BAD_IOVA) return 0; if (max > param->window_size) { ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Regards, Rajesh On Wed, Nov 6, 2019 at 7:28 PM David Marchand <david.march...@redhat.com> wrote: > On Tue, Nov 5, 2019 at 6:15 PM Burakov, Anatoly > <anatoly.bura...@intel.com> wrote: > > > > On 05-Nov-19 3:15 PM, Anatoly Burakov wrote: > > > Currently, externally created heaps are supposed to be automatically > > > mapped for VFIO DMA by EAL, however they only do so if, at the time of > > > heap creation, VFIO is initialized and has at least one device > > > available. If no devices are available at the time of heap creation (or > > > if devices were available, but were since hot-unplugged, thus dropping > > > all VFIO container mappings), then VFIO mapping code would have skipped > > > over externally allocated heaps. > > > > > > The fix is two-fold. First, we allow externally allocated memory > > > segments to be marked as "heap" segments. This allows us to distinguish > > > between external memory segments that were created via heap API, from > > > those that were created via rte_extmem_register() API. > > > > > > Then, we fix the VFIO code to only skip non-heap external segments. > > > Also, since external heaps are not guaranteed to have valid IOVA > > > addresses, we will skip those which have invalid IOVA addresses as > well. > > > > > > Fixes: 0f526d674f8e ("malloc: separate creating memseg list and malloc > heap") > > > > > > Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com> > > > --- > > > > > > Notes: > > > This cannot be backported to older releases as it breaks the > > > API and ABI. A separate fix is in the works for stable. > > > > > > > Alternative, non-breaking implementation available (which will be slower > > due to O(N) memseg list heaps lookups): > > > > http://patches.dpdk.org/patch/62486/ > > > > I'm fine with either option being merged. > > > > The more perfect solution would've been to rename "msl->external" into > > "msl->flags" and have various flags for memseg lists, but it's too late > > to break the API now. > > Either way is fine for me (between the 18.11 and the master patches you > sent). > Breaking the ABI is acceptable, but I agree the API is another story. > > If the code seems better to you with the additional heap flag, let's go > for it. > > I would still like to hear from Rajesh though, since he seems to be > the first to hit this issue. > > > -- > David Marchand > > -- Regards, Rajesh