Re: [dpdk-dev] [PATCH 3/4] mempool: introduce block size align flag

santosh Mon, 10 Jul 2017 09:23:47 -0700

On Monday 10 July 2017 06:45 PM, Olivier Matz wrote:

> On Wed, 5 Jul 2017 13:05:57 +0530, santosh 
> <santosh.shu...@caviumnetworks.com> wrote:
>> Hi Olivier,
>>
>> On Monday 03 July 2017 10:07 PM, Olivier Matz wrote:
>>
>>> On Wed, 21 Jun 2017 17:32:47 +0000, Santosh Shukla 
>>> <santosh.shu...@caviumnetworks.com> wrote:  
>>>> Some mempool hw like octeontx/fpa block, demands block size aligned
>>>> buffer address.
>>>>  
>>> What is the meaning of block size aligned?  
>> block size is total_elem_sz.
>>
>>> Does it mean that the address has to be a multiple of total_elt_size?  
>> yes.
>>
>>> Is this constraint on the virtual address only?
>>>  
>> both.
> You mean virtual address and physical address must be a multiple of
> total_elt_size? How is it possible?
>
> For instance, I have this on my dpdk instance:
> Segment 0: phys:0x52c00000, len:4194304, virt:0x7fed26000000, socket_id:0, 
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 1: phys:0x53400000, len:163577856, virt:0x7fed1c200000, socket_id:0, 
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 2: phys:0x5d400000, len:20971520, virt:0x7fed1ac00000, socket_id:0, 
> hugepage_sz:2097152, nchannel:0, nrank:0
> ...
>
> I assume total_elt_size is 2176.
>
> In segment 0, I need to use (virt = 0x7fed26000000 + 1536) to be
> multiple of 2176. But the corresponding physical address (0x52c00000 + 1536)
> is not multiple of 2176. 
>
>
> Please clarify if only the virtual address has to be aligned.


Yes. Its a virtual address.

>
>>>> Introducing an MEMPOOL_F_POOL_BLK_SZ_ALIGNED flag.
>>>> If this flag is set:
>>>> 1) adjust 'off' value to block size aligned value.
>>>> 2) Allocate one additional buffer. This buffer is used to make sure that
>>>> requested 'n' buffers get correctly populated to mempool.
>>>> Example:
>>>>    elem_sz = 2432 // total element size.
>>>>    n = 2111 // requested number of buffer.
>>>>    off = 2304 // new buf_offset value after step 1)
>>>>    vaddr = 0x0 // actual start address of pool
>>>>    pool_len = 5133952 // total pool length i.e.. (elem_sz * n)
>>>>
>>>> Since 'off' is a non-zero value so below condition would fail for the
>>>> block size align case.
>>>>
>>>> (((vaddr + off) + (elem_sz * n)) <= (vaddr + pool_len))
>>>>
>>>> Which is incorrect behavior. Additional buffer will solve this
>>>> problem and correctly populate 'n' buffer to mempool for the aligned
>>>> mode.  
>>> Sorry, but the example is not very clear.
>>>  
>> which part?
>>
>> I'll try to reword.
>>
>> The problem statement is:
>> - We want start of buffer address aligned to block_sz aka total_elt_sz.
>>
>> Proposed solution in this patch:
>> - Let's say that we get 'x' size of memory chunk from memzone.
>> - Ideally we start using buffer at address 0 to...(x-block_sz).
>> - Not necessarily first buffer address i.e. 0 is aligned to block_sz.
>> - So we derive offset value for block_sz alignment purpose i.e..'off' . 
>> - That 'off' makes sure that first va/pa address of buffer is blk_sz aligned.
>> - Calculating 'off' may end up sacrificing first buffer of pool. So total
>> number of buffer in pool is n-1, Which is incorrect behavior, Thats why
>> we add 1 addition buffer. We request memzone to allocate (n+1 * 
>> total_elt_sz) pool
>> area when F_BLK_SZ_ALIGNED flag is set.
>>
>>>> Signed-off-by: Santosh Shukla <santosh.shu...@caviumnetworks.com>
>>>> Signed-off-by: Jerin Jacob <jerin.ja...@caviumnetworks.com>
>>>> ---
>>>>  lib/librte_mempool/rte_mempool.c | 19 ++++++++++++++++---
>>>>  lib/librte_mempool/rte_mempool.h |  1 +
>>>>  2 files changed, 17 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/lib/librte_mempool/rte_mempool.c 
>>>> b/lib/librte_mempool/rte_mempool.c
>>>> index 7dec2f51d..2010857f0 100644
>>>> --- a/lib/librte_mempool/rte_mempool.c
>>>> +++ b/lib/librte_mempool/rte_mempool.c
>>>> @@ -350,7 +350,7 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
>>>> *vaddr,
>>>>  {
>>>>    unsigned total_elt_sz;
>>>>    unsigned i = 0;
>>>> -  size_t off;
>>>> +  size_t off, delta;
>>>>    struct rte_mempool_memhdr *memhdr;
>>>>    int ret;
>>>>  
>>>> @@ -387,7 +387,15 @@ rte_mempool_populate_phys(struct rte_mempool *mp, 
>>>> char *vaddr,
>>>>    memhdr->free_cb = free_cb;
>>>>    memhdr->opaque = opaque;
>>>>  
>>>> -  if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
>>>> +  if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED) {
>>>> +          delta = (uintptr_t)vaddr % total_elt_sz;
>>>> +          off = total_elt_sz - delta;
>>>> +          /* Validate alignment */
>>>> +          if (((uintptr_t)vaddr + off) % total_elt_sz) {
>>>> +                  RTE_LOG(ERR, MEMPOOL, "vaddr(%p) not aligned to 
>>>> total_elt_sz(%u)\n", (vaddr + off), total_elt_sz);
>>>> +                  return -EINVAL;
>>>> +          }
>>>> +  } else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
>>>>            off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
>>>>    else
>>>>            off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;  
>>> What is the purpose of this test? Can it fail?  
>> Purpose is to sanity check blk_sz alignment. No it won;t fail.
>> I thought better to keep sanity check but if you see no value
>> then will remove in v2?
> yes please
>
>
>>> Not sure having the delta variable is helpful. However, adding a
>>> small comment like this could help:
>>>
>>>     /* align object start address to a multiple of total_elt_sz */
>>>     off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
>>>     
>>> About style, please don't mix brackets and no-bracket blocks in the
>>> same if/elseif/else.  
>> ok, in v2. 
>>
>>>> @@ -555,8 +563,13 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>>>>    }
>>>>  
>>>>    total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
>>>> +
>>>>    for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
>>>> -          size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
>>>> +          if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED)
>>>> +                  size = rte_mempool_xmem_size(n + 1, total_elt_sz,
>>>> +                                                  pg_shift);
>>>> +          else
>>>> +                  size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
>>>>  
>>>>            ret = snprintf(mz_name, sizeof(mz_name),
>>>>                    RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);  
>>> One issue I see here is that this new flag breaks the function
>>> rte_mempool_xmem_size(), which calculates the maximum amount of memory
>>> required to store a given number of objects.
>>>
>>> It also probably breaks rte_mempool_xmem_usage().
>>>
>>> I don't have any good solution for now. A possibility is to change
>>> the behavior of these functions for everyone, meaning that we will
>>> always reserve more memory that really required. If this is done on
>>> every memory chunk (struct rte_mempool_memhdr), it can eat a lot
>>> of memory.
>>>
>>> Another approach would be to change the API of this function to
>>> pass the capability flags, or the mempool pointer... but there is
>>> a problem because these functions are usually called before the
>>> mempool is instanciated.
>>>  
>> Per my description on [1/4]. If we agree to call
>> _ops_get_capability() at very beginning i.e.. at _populate_default()
>> then 'mp->flag' has capability flag. and We could add one more argument
>> in _xmem_size( , flag)/_xmem_usage(, flag). 
>> - xmem_size / xmem_usage() to check for that capability bit in 'flag'. 
>> - if set then increase 'elt_num' by num.
>>
>> That way your approach 2) make sense to me and it will very well fit
>> in design. Won't waste memory like you mentioned in approach 1).
>>
>> Does that make sense?
> The use case of rte_mempool_xmem_size()/rte_mempool_xmem_usage()
> is to determine how much memory is needed to instanciate a mempool:
>
>   sz = rte_mempool_xmem_size(...);
>   ptr = allocate(sz);
>   paddr_table = get_phys_map(ptr);
>   mp = rte_mempool_xmem_create(..., ptr, ..., paddr_table, ...);
>
> If we want to transform the code to use another mempool_ops, it is
> not possible:
>
>   mp = rte_mempool_create_empty();
>   rte_mempool_set_ops_byname(mp, "my-handler");
>   sz = rte_mempool_xmem_size(...);   /* <<< the mp pointer is not passed */
>   ptr = allocate(sz);
>   paddr_table = get_phys_map(ptr);
>   mp = rte_mempool_xmem_create(..., ptr, ..., paddr_table, ...);
>
> So, yes, this approach would work but it needs to change the API.
> I think it is possible to keep a compat with previous versions.
>
>  

Ok. I'm planning to send out deprecation notice for xmem_size/usage api.
Does that make sense to you?

>>>> diff --git a/lib/librte_mempool/rte_mempool.h 
>>>> b/lib/librte_mempool/rte_mempool.h
>>>> index fd8722e69..99a20263d 100644
>>>> --- a/lib/librte_mempool/rte_mempool.h
>>>> +++ b/lib/librte_mempool/rte_mempool.h
>>>> @@ -267,6 +267,7 @@ struct rte_mempool {
>>>>  #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
>>>>  #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically 
>>>> contiguous objs. */
>>>>  #define MEMPOOL_F_POOL_CONTIG    0x0040 /**< Detect physcially contiguous 
>>>> objs */
>>>> +#define MEMPOOL_F_POOL_BLK_SZ_ALIGNED 0x0080 /**< Align buffer address to 
>>>> block size*/
>>>>  
>>>>  /**
>>>>   * @internal When debug is enabled, store some statistics.  
>>> Same comment than for patch 3: the explanation should really be clarified.
>>> It's a hw specific limitation, which won't be obvious for the people that
>>> will read that code, so we must document it as clear as possible.
>>>  
>> I won't see this as HW limitation. As mentioned in [1/4], even application
>> can request for block alignment, right?
> What would be the reason for an application would request this block aligment?
> The total size of the element is usually not known by the application,
> because the mempool adds its header and footer.
>
There were patches for custom alignment in past [1]. So its not new
initiative. 
Application don't have to know the internal block_size (total_elem_sz), but
My point is - Application can very much request for start address aligned
to block_size.
Besides that, We have enough space in MEMPOOL_F_ area so keeping alignment_flag
is not a problem.

Thanks.
[1] http://dpdk.org/dev/patchwork/patch/23885/

[1] http://dpdk.org/dev/patchwork/patch/23885/

Re: [dpdk-dev] [PATCH 3/4] mempool: introduce block size align flag

Reply via email to