> -----Original Message-----
> From: Andrew Rybchenko <arybche...@solarflare.com>
> Sent: Wednesday, July 17, 2019 7:07 PM
> To: Vamsi Krishna Attunuru <vattun...@marvell.com>; dev@dpdk.org
> Cc: tho...@monjalon.net; Jerin Jacob Kollanukkaran <jer...@marvell.com>;
> olivier.m...@6wind.com; ferruh.yi...@intel.com;
> anatoly.bura...@intel.com; Kiran Kumar Kokkilagadda
> <kirankum...@marvell.com>
> Subject: Re: [dpdk-dev] [PATCH v7 1/4] mempool: modify mempool
> populate() to skip objects from page boundaries
> 
> On 7/17/19 12:04 PM, vattun...@marvell.com wrote:
> > From: Vamsi Attunuru <vattun...@marvell.com>
> >
> > Currently the phys address of a mempool object populated by the
> > mempool populate default() routine may not be contiguous with in that
> mbuf range.
> >
> > Patch ensures that each object's phys address is contiguous by
> > modifying default behaviour of mempool populate() to prevent objects
> > from being across 2 pages, expect if the size of object is bigger than size 
> > of
> page.
> >
> > Since the overhead after this modification will be very minimal
> > considering the hugepage sizes of 512M & 1G, default behaviour is
> > modified except for the object sizes bigger than the page size.
> >
> > Signed-off-by: Vamsi Attunuru <vattun...@marvell.com>
> > Signed-off-by: Kiran Kumar K <kirankum...@marvell.com>
> 
> NACK
> 
> Looking at MEMPOOL_F_NO_IOVA_CONTIG description I don't understand
> why the patch is necessary at all. So, I'd like to know more. Exact 
> conditions,
> IOVA mode, hugepage sizes, mempool flags and how it is populated.
> 

I presume the commit log clarifies the  changes in the patch, pls correct me if 
it's not clear. The requirement is to create mempool of objects that each 
object's phys address in contiguous with in it's range,  having flexibility to 
create such mempools helpful to the applications like KNI to operate in IOVA=VA 
mode where KNI kernel mode can safely translate IOVA addresses to PA. 

Regarding the exact conditions driven this approach are, when IOVA mode is set 
to VA and huge page size of 2MB/512MB used, since KNI application creates 
mempool without any flags set, mempool populate routine tends to reserve iova 
contiguous memzones and finally it ends up populating mempool with some objects 
that might being across two pages(over the page boundary), those mbuf's phys 
address might not be contiguous and these mempool are not suitable for 
operating KNI in IOVA=VA mode.

> > ---
> >   lib/librte_mempool/rte_mempool.c             |  2 +-
> >   lib/librte_mempool/rte_mempool_ops_default.c | 33
> ++++++++++++++++++++++++++--
> >   2 files changed, 32 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_mempool/rte_mempool.c
> > b/lib/librte_mempool/rte_mempool.c
> > index 7260ce0..1c48325 100644
> > --- a/lib/librte_mempool/rte_mempool.c
> > +++ b/lib/librte_mempool/rte_mempool.c
> > @@ -339,7 +339,7 @@ rte_mempool_populate_iova(struct rte_mempool
> *mp, char *vaddr,
> >     i = rte_mempool_ops_populate(mp, mp->size - mp->populated_size,
> >             (char *)vaddr + off,
> >             (iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off),
> > -           len - off, mempool_add_elem, NULL);
> > +           len - off, mempool_add_elem, opaque);
> 
> The last argument is the callback opaque value. mempool_add_elem() does
> not use the opaque. But it is incorrect to use it for other purposes and
> require it to be memzone.

To avoid multiple changes in the mempool APIs for adding new variable, opaque 
has been leveraged here. I think there is no harm in having this approach since 
it carries the relevant info and quite suitable between the 
(*mempool_populate_t) calls.  

> 
> >     /* not enough room to store one object */
> >     if (i == 0) {
> > diff --git a/lib/librte_mempool/rte_mempool_ops_default.c
> > b/lib/librte_mempool/rte_mempool_ops_default.c
> > index 4e2bfc8..85da264 100644
> > --- a/lib/librte_mempool/rte_mempool_ops_default.c
> > +++ b/lib/librte_mempool/rte_mempool_ops_default.c
> > @@ -45,19 +45,48 @@ rte_mempool_op_calc_mem_size_default(const
> struct rte_mempool *mp,
> >     return mem_size;
> >   }
> >
> > +/* Returns -1 if object falls on a page boundary, else returns 0 */
> > +static inline int mempool_check_obj_bounds(void *obj, uint64_t
> > +hugepage_sz, size_t elt_sz) {
> > +   uintptr_t page_end, elt_addr = (uintptr_t)obj;
> > +   uint32_t pg_shift = rte_bsf32(hugepage_sz);
> > +   uint64_t page_mask;
> > +
> > +   page_mask =  ~((1ull << pg_shift) - 1);
> > +   page_end = (elt_addr & page_mask) + hugepage_sz;
> > +
> > +   if (elt_addr + elt_sz > page_end)
> > +           return -1;
> > +
> > +   return 0;
> > +}
> > +
> >   int
> >   rte_mempool_op_populate_default(struct rte_mempool *mp, unsigned
> int max_objs,
> >             void *vaddr, rte_iova_t iova, size_t len,
> >             rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
> >   {
> > -   size_t total_elt_sz;
> > -   size_t off;
> > +   struct rte_memzone *mz = obj_cb_arg;
> > +   size_t total_elt_sz, off;
> 
> Why two variables are combined into one here? It is unrelated change.
> 
> >     unsigned int i;
> >     void *obj;
> >
> >     total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
> >
> >     for (off = 0, i = 0; off + total_elt_sz <= len && i < max_objs;
> > i++) {
> > +
> > +           /* Skip page boundary check if element is bigger than page */
> > +           if (mz->hugepage_sz >= total_elt_sz) {
> > +                   if (mempool_check_obj_bounds((char *)vaddr + off,
> > +                                               mz->hugepage_sz,
> > +                                               total_elt_sz) < 0) {
> > +                           i--; /* Decrement count & skip this obj */
> > +                           off += total_elt_sz;
> > +                           continue;
> > +                   }
> > +           }
> > +
> 
> What I don't like here is that it makes one memory chunk insufficient to
> populate all objects. I.e. we calculated memory chunk size required, but
> skipped some object. May be it is not a problem, but breaks the logic existing
> in the code.
> 
> >             off += mp->header_size;
> >             obj = (char *)vaddr + off;
> >             obj_cb(mp, obj_cb_arg, obj,
> 

Reply via email to