Re: [dpdk-dev] [PATCH v2 23/41] mempool: add support for the new allocation methods

Shreyansh Jain Tue, 20 Mar 2018 04:36:08 -0700

Hello Anatoly,

On Wed, Mar 7, 2018 at 10:26 PM, Anatoly Burakov
<anatoly.bura...@intel.com> wrote:
> If a user has specified that the zone should have contiguous memory,
> use the new _contig allocation API's instead of normal ones.
> Otherwise, account for the fact that unless we're in IOVA_AS_VA
> mode, we cannot guarantee that the pages would be physically
> contiguous, so we calculate the memzone size and alignments as if
> we were getting the smallest page size available.
>
> Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com>
> ---


[...]

>  static void
>  mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
>  {
> @@ -549,6 +570,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>         unsigned mz_id, n;
>         unsigned int mp_flags;
>         int ret;
> +       bool force_contig, no_contig;
>
>         /* mempool must not be populated */
>         if (mp->nb_mem_chunks != 0)
> @@ -563,10 +585,46 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>         /* update mempool capabilities */
>         mp->flags |= mp_flags;
>
> -       if (rte_eal_has_hugepages()) {
> -               pg_shift = 0; /* not needed, zone is physically contiguous */
> +       no_contig = mp->flags & MEMPOOL_F_NO_PHYS_CONTIG;
> +       force_contig = mp->flags & MEMPOOL_F_CAPA_PHYS_CONTIG;
> +
> +       /*
> +        * there are several considerations for page size and page shift here.
> +        *
> +        * if we don't need our mempools to have physically contiguous 
> objects,
> +        * then just set page shift and page size to 0, because the user has
> +        * indicated that there's no need to care about anything.

I think the above case is not handled properly here.
reason below...

> +        *
> +        * if we do need contiguous objects, there is also an option to 
> reserve
> +        * the entire mempool memory as one contiguous block of memory, in
> +        * which case the page shift and alignment wouldn't matter as well.
> +        *
> +        * if we require contiguous objects, but not necessarily the entire
> +        * mempool reserved space to be contiguous, then there are two 
> options.
> +        *
> +        * if our IO addresses are virtual, not actual physical (IOVA as VA
> +        * case), then no page shift needed - our memory allocation will give 
> us
> +        * contiguous physical memory as far as the hardware is concerned, so
> +        * act as if we're getting contiguous memory.
> +        *
> +        * if our IO addresses are physical, we may get memory from bigger
> +        * pages, or we might get memory from smaller pages, and how much of 
> it
> +        * we require depends on whether we want bigger or smaller pages.
> +        * However, requesting each and every memory size is too much work, so
> +        * what we'll do instead is walk through the page sizes available, 
> pick
> +        * the smallest one and set up page shift to match that one. We will 
> be
> +        * wasting some space this way, but it's much nicer than looping 
> around
> +        * trying to reserve each and every page size.
> +        */
> +
> +       if (no_contig || force_contig || rte_eal_iova_mode() == RTE_IOVA_VA) {
>                 pg_sz = 0;
> +               pg_shift = 0;
>                 align = RTE_CACHE_LINE_SIZE;

So, assuming dpaa2 as example, I ran testpmd. IOVA=VA is the mode.
pg_sz = 0 is set.
same as before applying the hotplug patchset except that earlier this
decision was purely based on availability of hugepages
(rte_eal_has_hugepages()).
Moving on...

> +       } else if (rte_eal_has_hugepages()) {
> +               pg_sz = get_min_page_size();
> +               pg_shift = rte_bsf32(pg_sz);
> +               align = pg_sz;
>         } else {
>                 pg_sz = getpagesize();
>                 pg_shift = rte_bsf32(pg_sz);
> @@ -585,23 +643,34 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>                         goto fail;
>                 }
>
> -               mz = rte_memzone_reserve_aligned(mz_name, size,
> -                       mp->socket_id, mz_flags, align);
> -               /* not enough memory, retry with the biggest zone we have */
> -               if (mz == NULL)
> -                       mz = rte_memzone_reserve_aligned(mz_name, 0,
> +               if (force_contig) {
> +                       /*
> +                        * if contiguous memory for entire mempool memory was
> +                        * requested, don't try reserving again if we fail.
> +                        */
> +                       mz = rte_memzone_reserve_aligned_contig(mz_name, size,
> +                               mp->socket_id, mz_flags, align);
> +               } else {
> +                       mz = rte_memzone_reserve_aligned(mz_name, size,
>                                 mp->socket_id, mz_flags, align);
> +                       /* not enough memory, retry with the biggest zone we
> +                        * have
> +                        */
> +                       if (mz == NULL)
> +                               mz = rte_memzone_reserve_aligned(mz_name, 0,
> +                                       mp->socket_id, mz_flags, align);
> +               }
>                 if (mz == NULL) {
>                         ret = -rte_errno;
>                         goto fail;
>                 }
>
> -               if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG)
> +               if (no_contig)
>                         iova = RTE_BAD_IOVA;
>                 else
>                         iova = mz->iova;
>
> -               if (rte_eal_has_hugepages())
> +               if (rte_eal_has_hugepages() && force_contig)

So, pre-hotplugging patch, call used to enter mempool_populate_iova.
But, with the 'force_contig' not set (in app/test-pmd/testpmd.c:521)
while calling rte_pktmbuf_pool_create, rte_mempool_populate_va is
called instead.

>                         ret = rte_mempool_populate_iova(mp, mz->addr,
>                                 iova, mz->len,
>                                 rte_mempool_memchunk_mz_free,
> --
> 2.7.4

This is called with pg_sz = 0:
678                 else
>># 679                   ret = rte_mempool_populate_virt(mp, mz->addr,
680                                 mz->len, pg_sz,
681                                 rte_mempool_memchunk_mz_free,
682                                 (void *)(uintptr_t)mz);

In this function,

512         /* address and len must be page-aligned */
513         if (RTE_PTR_ALIGN_CEIL(addr, pg_sz) != addr)
514                 return -EINVAL;

This is where error is returned.

I don't think RTE_PTR_ALIGN_CEIL is designed to handle pg_sz = 0.

It is roughly equivalent to:
RTE_PTR_ALIGN_FLOOR(((uintptr_t)addr - 1), pg_sz) which returns NULL
(0 ~ pg_sz).

Basically, this ends up failing rte_mempool_populate_default.

I think the reason is the assumption that when
rte_mempool_populate_virt is called, it can handle 0 page sizes (there
would issues besides the above RTE_PTR_ALIGN_CEIL as well, like a
for-loop looping over off+pg_sz), is wrong. It needs a valid page-size
value to work with (!0).

So, basically, DPAA2 is stuck with this patch because of above issue,
if I am correctly comprehending it as above.

Regards,
Shreyansh

Re: [dpdk-dev] [PATCH v2 23/41] mempool: add support for the new allocation methods

Reply via email to