17/12/2018 10:40, Burakov, Anatoly:
> On 14-Dec-18 5:13 PM, Jim Harris wrote:
> > SPDK uses the rte_mem_event_callback_register API to
> > create RDMA memory regions (MRs) for newly allocated regions
> > of memory. This is used in both the SPDK NVMe-oF target
> > and the NVMe-oF host driver.
> > 
> > DPDK creates internal malloc_elem structures for these
> > allocated regions. As users malloc and free memory, DPDK
> > will sometimes merge malloc_elems that originated from
> > different allocations that were notified through the
> > registered mem_event callback routine. This results
> > in subsequent allocations that can span across multiple
> > RDMA MRs. This requires SPDK to check each DPDK buffer to
> > see if it crosses an MR boundary, and if so, would have to
> > add considerable logic and complexity to describe that
> > buffer before it can be accessed by the RNIC. It is somewhat
> > analagous to rte_malloc returning a buffer that is not
> > IOVA-contiguous.
> > 
> > As a malloc_elem gets split and some of these elements
> > get freed, it can also result in DPDK sending an
> > RTE_MEM_EVENT_FREE notification for a subset of the
> > original RTE_MEM_EVENT_ALLOC notification. This is also
> > problematic for RDMA memory regions, since unregistering
> > the memory region is all-or-nothing. It is not possible
> > to unregister part of a memory region.
> > 
> > To support these types of applications, this patch adds
> > a new --match-allocations EAL init flag. When this
> > flag is specified, malloc elements from different
> > hugepage allocations will never be merged. Memory will
> > also only be freed back to the system (with the requisite
> > memory event callback) exactly as it was originally
> > allocated.
> > 
> > Since part of this patch is extending the size of struct
> > malloc_elem, we also fix up the malloc autotests so they
> > do not assume its size exactly fits in one cacheline.
> > 
> > Signed-off-by: Jim Harris <james.r.har...@intel.com>
> 
> Reviewed-by: Anatoly Burakov <anatoly.bura...@intel.com>

Applied, thanks

This is one more example of how bad is the DPDK initialization.
This new option is fixing an application concern, so it should be
an API through init functions, not a user option.

I think we really need to refactor initialization APIs.


Reply via email to