On Thu, Sep 27, 2018 at 11:40:59AM +0100, Anatoly Burakov wrote: > When we allocate and use DPDK memory, we need to be able to > differentiate between DPDK hugepage segments and segments that > were made part of DPDK but are externally allocated. Add such > a property to memseg lists. > > This breaks the ABI, so bump the EAL library ABI version and > document the change in release notes. This also breaks a few > internal assumptions about memory contiguousness, so adjust > malloc code in a few places. > > All current calls for memseg walk functions were adjusted to > ignore external segments where it made sense. > > Mempools is a special case, because we may be asked to allocate > a mempool on a specific socket, and we need to ignore all page > sizes on other heaps or other sockets. Previously, this > assumption of knowing all page sizes was not a problem, but it > will be now, so we have to match socket ID with page size when > calculating minimum page size for a mempool. > > Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com> > Acked-by: Andrew Rybchenko <arybche...@solarflare.com> > --- > > Notes: > v3: > - Add comment to explain the process of picking up minimum > page sizes for mempool > > v2: > - Add documentation changes and ABI break > > v1: > - Adjust all calls to memseg walk functions to ignore external > segments where it made sense to do so > > doc/guides/rel_notes/deprecation.rst | 15 -------- > doc/guides/rel_notes/release_18_11.rst | 13 ++++++- > drivers/bus/fslmc/fslmc_vfio.c | 7 ++-- > drivers/net/mlx4/mlx4_mr.c | 3 ++ > drivers/net/mlx5/mlx5.c | 5 ++- > drivers/net/mlx5/mlx5_mr.c | 3 ++ > drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++- > lib/librte_eal/bsdapp/eal/Makefile | 2 +- > lib/librte_eal/bsdapp/eal/eal.c | 3 ++ > lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++-- > lib/librte_eal/common/eal_common_memory.c | 3 ++ > .../common/include/rte_eal_memconfig.h | 1 + > lib/librte_eal/common/include/rte_memory.h | 9 +++++ > lib/librte_eal/common/malloc_elem.c | 10 ++++-- > lib/librte_eal/common/malloc_heap.c | 9 +++-- > lib/librte_eal/common/rte_malloc.c | 2 +- > lib/librte_eal/linuxapp/eal/Makefile | 2 +- > lib/librte_eal/linuxapp/eal/eal.c | 10 +++++- > lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++ > lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++--- > lib/librte_eal/meson.build | 2 +- > lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++----- > test/test/test_malloc.c | 3 ++ > test/test/test_memzone.c | 3 ++ > 24 files changed, 134 insertions(+), 44 deletions(-) > > diff --git a/doc/guides/rel_notes/deprecation.rst > b/doc/guides/rel_notes/deprecation.rst > index 138335dfb..d2aec64d1 100644 > --- a/doc/guides/rel_notes/deprecation.rst > +++ b/doc/guides/rel_notes/deprecation.rst > @@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here. > Deprecation Notices > ------------------- > > -* eal: certain structures will change in EAL on account of upcoming external > - memory support. Aside from internal changes leading to an ABI break, the > - following externally visible changes will also be implemented: > - > - - ``rte_memseg_list`` will change to include a boolean flag indicating > - whether a particular memseg list is externally allocated. This will have > - implications for any users of memseg-walk-related functions, as they will > - now have to skip externally allocated segments in most cases if the > intent > - is to only iterate over internal DPDK memory. > - - ``socket_id`` parameter across the entire DPDK will gain additional > meaning, > - as some socket ID's will now be representing externally allocated > memory. No > - changes will be required for existing code as backwards compatibility > will > - be kept, and those who do not use this feature will not see these extra > - socket ID's. > - > * eal: both declaring and identifying devices will be streamlined in v18.11. > New functions will appear to query a specific port from buses, classes of > device and device drivers. Device declaration will be made coherent with > the > diff --git a/doc/guides/rel_notes/release_18_11.rst > b/doc/guides/rel_notes/release_18_11.rst > index bc9b74ec4..5fc71e208 100644 > --- a/doc/guides/rel_notes/release_18_11.rst > +++ b/doc/guides/rel_notes/release_18_11.rst > @@ -91,6 +91,13 @@ API Changes > flag the MAC can be properly configured in any case. This is particularly > important for bonding. > > +* eal: The following API changes were made in 18.11: > + > + - ``rte_memseg_list`` structure now has an additional flag indicating > whether > + the memseg list is externally allocated. This will have implications for > any > + users of memseg-walk-related functions, as they will now have to skip > + externally allocated segments in most cases if the intent is to only > iterate > + over internal DPDK memory. > > ABI Changes > ----------- > @@ -107,6 +114,10 @@ ABI Changes > ========================================================= > > > +* eal: EAL library ABI version was changed due to previously announced work > on > + supporting external memory in DPDK. Structure ``rte_memseg_list`` now > has > + a new flag indicating whether the memseg list refers to external > memory. > + > Removed Items > ------------- > > @@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented > in this version. > librte_compressdev.so.1 > librte_cryptodev.so.5 > librte_distributor.so.1 > - librte_eal.so.8 > + + librte_eal.so.9 > librte_ethdev.so.10 > librte_eventdev.so.4 > librte_flow_classify.so.1 > diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c > index 4c2cd2a87..2e9244fb7 100644 > --- a/drivers/bus/fslmc/fslmc_vfio.c > +++ b/drivers/bus/fslmc/fslmc_vfio.c > @@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr > __rte_unused, size_t len) > } > > static int > -fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused, > - const struct rte_memseg *ms, void *arg) > +fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg > *ms, > + void *arg) > { > int *n_segs = arg; > int ret; > > + if (msl->external) > + return 0; > + > ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len); > if (ret) > DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)", > diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c > index d23d3c613..9f5d790b6 100644 > --- a/drivers/net/mlx4/mlx4_mr.c > +++ b/drivers/net/mlx4/mlx4_mr.c > @@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list > *msl, > { > struct mr_find_contig_memsegs_data *data = arg; > > + if (msl->external) > + return 0; > +
Because memory free event for external memory is available, current design of mlx4/mlx5 memory mgmt can accommodate the new external memory support. So, please remove it so that PMD can traverse external memory as well. > if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len) > return 0; > /* Found, save it and stop walking. */ > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c > index 30d4e70a7..c90e1d8ce 100644 > --- a/drivers/net/mlx5/mlx5.c > +++ b/drivers/net/mlx5/mlx5.c > @@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver; > static void *uar_base; > > static int > -find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused, > +find_lower_va_bound(const struct rte_memseg_list *msl, > const struct rte_memseg *ms, void *arg) > { > void **addr = arg; > > + if (msl->external) > + return 0; > + This one is fine. But can you please remove the blank line? That's a rule by former maintainers. :-) > if (*addr == NULL) > *addr = ms->addr; > else > diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c > index 1d1bcb5fe..fd4345f9c 100644 > --- a/drivers/net/mlx5/mlx5_mr.c > +++ b/drivers/net/mlx5/mlx5_mr.c > @@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list > *msl, > { > struct mr_find_contig_memsegs_data *data = arg; > > + if (msl->external) > + return 0; > + Like I mentioned, please remove it. If those two changes in mlx4/5_mr.c are removed, for the whole patch, Acked-by: Yongseok Koh <ys...@mellanox.com> Thanks Yongseok