Re: [dpdk-dev] Aligned rte_mempool for storage applications

Harris, James R Mon, 25 Mar 2019 14:15:05 -0700


On 3/25/19, 2:06 PM, "Howell, Seth" <seth.how...@intel.com> wrote:


    Hello,
    
    In SPDK, we use the rte_mempool struct for many internal structure 
collections. The per-thread cache and ease of allocation of mempools are very 
useful features.
    Some of the collections we store in SPDK are pools of I/O buffers. 
Typically, these pools contain elements of at least 4096 bytes, and we would 
like them to be aligned to 4k for performance reasons.

[Jim] Just to clarify Seth's point - the performance reasons are specifically 
to avoid wasteful memcopies.  The vast majority of NVMe SSDs in the market 
today do not have full scatter/gather support - rather they only support 
something called PRP (Physical Region Pages) which require all scatter gather 
elements except the first to be 4KB aligned.  There are other storage 
interfaces such as Linux AIO that also impose alignment restrictions.

-Jim


    Currently, the rte_mempool API doesn't support aligned mempool objects. 
This means that when we allocate a 4k buffer and want it aligned to 4k, we 
actually need to allocate an 8k buffer and calculate an offset into it each 
time we want to use it.
    We recently did a proof of concept using the rte_mempool_ops hook where we 
allocated a mempool and populated it with aligned entries. This allowed us to 
retrieve aligned addresses directly from rte_mempool_get(), but didn't help 
with the allocation size.
    Because the rte_mempool struct assumes that each element has a header 
attached to it, we still need to live up to that assumption for each object we 
create in a mempool. This means that the actual size of a buffer becomes 4k + 
24 bytes. In order to get to our next aligned address, we need to add about 4k 
of padding to each element.
    Modifying the current rte_mempool struct to allow entries without headers 
seems impossible since it would break rte_mempool_for_obj_iter and 
rte_mempool_from_obj. However I still think there is a lot of benefit to be 
gained from a mempool structure that supports aligned objects without headers.
    I am wondering if DPDK would be open to us introducing an 
rte_mempool_aligned structure. This structure would essentially be a wrapper 
around a regular mempool struct. However, it would not require headers or 
trailers for each object in the pool.
    
    This structure would only be applicable to a subset of mempools with the 
following characteristics:
        1. mempools for which the following flags were set: 
MEMPOOL_F_NO_CACHE_ALIGNED, MEMPOOL_F_NO_IOVA_CONTIG , MEMPOOL_F_NO_SPREAD
        2. mempools that do not require the use of the following functions 
rte_mempool_from_obj (requires a pointer to the mp in the header of each obj), 
rte_mempool_for_obj_iter.
        3. Any attempt to create this object when RTE_LIBRTE_MEMPOOL_DEBUG was 
enabled would necessarily fail since we can't check the header cookies.
    
    My thought would be that we could implement this data structure in a header 
and it would look something like this:
    
    Struct rte_mempool_aligned {
        Struct rte_mempool mp;
        Size_t obj_alignment;
    };
    
    The rest of the functions in the header would primarily be wrappers around 
the original functions. Most functions (rte_mempool_alloc, rte_mempool_free, 
rte_mempool_enqueue/dequeue, rte_mempool_get_count, etc.) could be implemented 
directly as wrappers, and others such as rte_mempool_create and the populate 
functions would have to be re-implemented to some degree in the new header. The 
remaining functions (check_cookies, for_obj_iter) would not be implemented in 
the rte_mempool_aligned.h file. 
    
    Would the community be welcoming of a new rte_mempool_aligned struct? If 
you don't feel like this would be the way to go, are there other options in 
DPDK for creating a pool of pre-allocated aligned objects? 
    
    Thank you,
    
    Seth Howell

Re: [dpdk-dev] Aligned rte_mempool for storage applications

Reply via email to