On Monday 10 July 2017 06:45 PM, Olivier Matz wrote: > On Wed, 5 Jul 2017 13:05:57 +0530, santosh > <santosh.shu...@caviumnetworks.com> wrote: >> Hi Olivier, >> >> On Monday 03 July 2017 10:07 PM, Olivier Matz wrote: >> >>> On Wed, 21 Jun 2017 17:32:47 +0000, Santosh Shukla >>> <santosh.shu...@caviumnetworks.com> wrote: >>>> Some mempool hw like octeontx/fpa block, demands block size aligned >>>> buffer address. >>>> >>> What is the meaning of block size aligned? >> block size is total_elem_sz. >> >>> Does it mean that the address has to be a multiple of total_elt_size? >> yes. >> >>> Is this constraint on the virtual address only? >>> >> both. > You mean virtual address and physical address must be a multiple of > total_elt_size? How is it possible? > > For instance, I have this on my dpdk instance: > Segment 0: phys:0x52c00000, len:4194304, virt:0x7fed26000000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 1: phys:0x53400000, len:163577856, virt:0x7fed1c200000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 2: phys:0x5d400000, len:20971520, virt:0x7fed1ac00000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > ... > > I assume total_elt_size is 2176. > > In segment 0, I need to use (virt = 0x7fed26000000 + 1536) to be > multiple of 2176. But the corresponding physical address (0x52c00000 + 1536) > is not multiple of 2176. > > > Please clarify if only the virtual address has to be aligned.
Yes. Its a virtual address. > >>>> Introducing an MEMPOOL_F_POOL_BLK_SZ_ALIGNED flag. >>>> If this flag is set: >>>> 1) adjust 'off' value to block size aligned value. >>>> 2) Allocate one additional buffer. This buffer is used to make sure that >>>> requested 'n' buffers get correctly populated to mempool. >>>> Example: >>>> elem_sz = 2432 // total element size. >>>> n = 2111 // requested number of buffer. >>>> off = 2304 // new buf_offset value after step 1) >>>> vaddr = 0x0 // actual start address of pool >>>> pool_len = 5133952 // total pool length i.e.. (elem_sz * n) >>>> >>>> Since 'off' is a non-zero value so below condition would fail for the >>>> block size align case. >>>> >>>> (((vaddr + off) + (elem_sz * n)) <= (vaddr + pool_len)) >>>> >>>> Which is incorrect behavior. Additional buffer will solve this >>>> problem and correctly populate 'n' buffer to mempool for the aligned >>>> mode. >>> Sorry, but the example is not very clear. >>> >> which part? >> >> I'll try to reword. >> >> The problem statement is: >> - We want start of buffer address aligned to block_sz aka total_elt_sz. >> >> Proposed solution in this patch: >> - Let's say that we get 'x' size of memory chunk from memzone. >> - Ideally we start using buffer at address 0 to...(x-block_sz). >> - Not necessarily first buffer address i.e. 0 is aligned to block_sz. >> - So we derive offset value for block_sz alignment purpose i.e..'off' . >> - That 'off' makes sure that first va/pa address of buffer is blk_sz aligned. >> - Calculating 'off' may end up sacrificing first buffer of pool. So total >> number of buffer in pool is n-1, Which is incorrect behavior, Thats why >> we add 1 addition buffer. We request memzone to allocate (n+1 * >> total_elt_sz) pool >> area when F_BLK_SZ_ALIGNED flag is set. >> >>>> Signed-off-by: Santosh Shukla <santosh.shu...@caviumnetworks.com> >>>> Signed-off-by: Jerin Jacob <jerin.ja...@caviumnetworks.com> >>>> --- >>>> lib/librte_mempool/rte_mempool.c | 19 ++++++++++++++++--- >>>> lib/librte_mempool/rte_mempool.h | 1 + >>>> 2 files changed, 17 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/lib/librte_mempool/rte_mempool.c >>>> b/lib/librte_mempool/rte_mempool.c >>>> index 7dec2f51d..2010857f0 100644 >>>> --- a/lib/librte_mempool/rte_mempool.c >>>> +++ b/lib/librte_mempool/rte_mempool.c >>>> @@ -350,7 +350,7 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char >>>> *vaddr, >>>> { >>>> unsigned total_elt_sz; >>>> unsigned i = 0; >>>> - size_t off; >>>> + size_t off, delta; >>>> struct rte_mempool_memhdr *memhdr; >>>> int ret; >>>> >>>> @@ -387,7 +387,15 @@ rte_mempool_populate_phys(struct rte_mempool *mp, >>>> char *vaddr, >>>> memhdr->free_cb = free_cb; >>>> memhdr->opaque = opaque; >>>> >>>> - if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN) >>>> + if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED) { >>>> + delta = (uintptr_t)vaddr % total_elt_sz; >>>> + off = total_elt_sz - delta; >>>> + /* Validate alignment */ >>>> + if (((uintptr_t)vaddr + off) % total_elt_sz) { >>>> + RTE_LOG(ERR, MEMPOOL, "vaddr(%p) not aligned to >>>> total_elt_sz(%u)\n", (vaddr + off), total_elt_sz); >>>> + return -EINVAL; >>>> + } >>>> + } else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN) >>>> off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr; >>>> else >>>> off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr; >>> What is the purpose of this test? Can it fail? >> Purpose is to sanity check blk_sz alignment. No it won;t fail. >> I thought better to keep sanity check but if you see no value >> then will remove in v2? > yes please > > >>> Not sure having the delta variable is helpful. However, adding a >>> small comment like this could help: >>> >>> /* align object start address to a multiple of total_elt_sz */ >>> off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz); >>> >>> About style, please don't mix brackets and no-bracket blocks in the >>> same if/elseif/else. >> ok, in v2. >> >>>> @@ -555,8 +563,13 @@ rte_mempool_populate_default(struct rte_mempool *mp) >>>> } >>>> >>>> total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size; >>>> + >>>> for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) { >>>> - size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift); >>>> + if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED) >>>> + size = rte_mempool_xmem_size(n + 1, total_elt_sz, >>>> + pg_shift); >>>> + else >>>> + size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift); >>>> >>>> ret = snprintf(mz_name, sizeof(mz_name), >>>> RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id); >>> One issue I see here is that this new flag breaks the function >>> rte_mempool_xmem_size(), which calculates the maximum amount of memory >>> required to store a given number of objects. >>> >>> It also probably breaks rte_mempool_xmem_usage(). >>> >>> I don't have any good solution for now. A possibility is to change >>> the behavior of these functions for everyone, meaning that we will >>> always reserve more memory that really required. If this is done on >>> every memory chunk (struct rte_mempool_memhdr), it can eat a lot >>> of memory. >>> >>> Another approach would be to change the API of this function to >>> pass the capability flags, or the mempool pointer... but there is >>> a problem because these functions are usually called before the >>> mempool is instanciated. >>> >> Per my description on [1/4]. If we agree to call >> _ops_get_capability() at very beginning i.e.. at _populate_default() >> then 'mp->flag' has capability flag. and We could add one more argument >> in _xmem_size( , flag)/_xmem_usage(, flag). >> - xmem_size / xmem_usage() to check for that capability bit in 'flag'. >> - if set then increase 'elt_num' by num. >> >> That way your approach 2) make sense to me and it will very well fit >> in design. Won't waste memory like you mentioned in approach 1). >> >> Does that make sense? > The use case of rte_mempool_xmem_size()/rte_mempool_xmem_usage() > is to determine how much memory is needed to instanciate a mempool: > > sz = rte_mempool_xmem_size(...); > ptr = allocate(sz); > paddr_table = get_phys_map(ptr); > mp = rte_mempool_xmem_create(..., ptr, ..., paddr_table, ...); > > If we want to transform the code to use another mempool_ops, it is > not possible: > > mp = rte_mempool_create_empty(); > rte_mempool_set_ops_byname(mp, "my-handler"); > sz = rte_mempool_xmem_size(...); /* <<< the mp pointer is not passed */ > ptr = allocate(sz); > paddr_table = get_phys_map(ptr); > mp = rte_mempool_xmem_create(..., ptr, ..., paddr_table, ...); > > So, yes, this approach would work but it needs to change the API. > I think it is possible to keep a compat with previous versions. > > Ok. I'm planning to send out deprecation notice for xmem_size/usage api. Does that make sense to you? >>>> diff --git a/lib/librte_mempool/rte_mempool.h >>>> b/lib/librte_mempool/rte_mempool.h >>>> index fd8722e69..99a20263d 100644 >>>> --- a/lib/librte_mempool/rte_mempool.h >>>> +++ b/lib/librte_mempool/rte_mempool.h >>>> @@ -267,6 +267,7 @@ struct rte_mempool { >>>> #define MEMPOOL_F_POOL_CREATED 0x0010 /**< Internal: pool is created. */ >>>> #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically >>>> contiguous objs. */ >>>> #define MEMPOOL_F_POOL_CONTIG 0x0040 /**< Detect physcially contiguous >>>> objs */ >>>> +#define MEMPOOL_F_POOL_BLK_SZ_ALIGNED 0x0080 /**< Align buffer address to >>>> block size*/ >>>> >>>> /** >>>> * @internal When debug is enabled, store some statistics. >>> Same comment than for patch 3: the explanation should really be clarified. >>> It's a hw specific limitation, which won't be obvious for the people that >>> will read that code, so we must document it as clear as possible. >>> >> I won't see this as HW limitation. As mentioned in [1/4], even application >> can request for block alignment, right? > What would be the reason for an application would request this block aligment? > The total size of the element is usually not known by the application, > because the mempool adds its header and footer. > There were patches for custom alignment in past [1]. So its not new initiative. Application don't have to know the internal block_size (total_elem_sz), but My point is - Application can very much request for start address aligned to block_size. Besides that, We have enough space in MEMPOOL_F_ area so keeping alignment_flag is not a problem. Thanks. [1] http://dpdk.org/dev/patchwork/patch/23885/ [1] http://dpdk.org/dev/patchwork/patch/23885/