Hello Anatoly, On Wed, Mar 7, 2018 at 10:26 PM, Anatoly Burakov <anatoly.bura...@intel.com> wrote: > If a user has specified that the zone should have contiguous memory, > use the new _contig allocation API's instead of normal ones. > Otherwise, account for the fact that unless we're in IOVA_AS_VA > mode, we cannot guarantee that the pages would be physically > contiguous, so we calculate the memzone size and alignments as if > we were getting the smallest page size available. > > Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com> > ---
[...] > static void > mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova) > { > @@ -549,6 +570,7 @@ rte_mempool_populate_default(struct rte_mempool *mp) > unsigned mz_id, n; > unsigned int mp_flags; > int ret; > + bool force_contig, no_contig; > > /* mempool must not be populated */ > if (mp->nb_mem_chunks != 0) > @@ -563,10 +585,46 @@ rte_mempool_populate_default(struct rte_mempool *mp) > /* update mempool capabilities */ > mp->flags |= mp_flags; > > - if (rte_eal_has_hugepages()) { > - pg_shift = 0; /* not needed, zone is physically contiguous */ > + no_contig = mp->flags & MEMPOOL_F_NO_PHYS_CONTIG; > + force_contig = mp->flags & MEMPOOL_F_CAPA_PHYS_CONTIG; > + > + /* > + * there are several considerations for page size and page shift here. > + * > + * if we don't need our mempools to have physically contiguous > objects, > + * then just set page shift and page size to 0, because the user has > + * indicated that there's no need to care about anything. I think the above case is not handled properly here. reason below... > + * > + * if we do need contiguous objects, there is also an option to > reserve > + * the entire mempool memory as one contiguous block of memory, in > + * which case the page shift and alignment wouldn't matter as well. > + * > + * if we require contiguous objects, but not necessarily the entire > + * mempool reserved space to be contiguous, then there are two > options. > + * > + * if our IO addresses are virtual, not actual physical (IOVA as VA > + * case), then no page shift needed - our memory allocation will give > us > + * contiguous physical memory as far as the hardware is concerned, so > + * act as if we're getting contiguous memory. > + * > + * if our IO addresses are physical, we may get memory from bigger > + * pages, or we might get memory from smaller pages, and how much of > it > + * we require depends on whether we want bigger or smaller pages. > + * However, requesting each and every memory size is too much work, so > + * what we'll do instead is walk through the page sizes available, > pick > + * the smallest one and set up page shift to match that one. We will > be > + * wasting some space this way, but it's much nicer than looping > around > + * trying to reserve each and every page size. > + */ > + > + if (no_contig || force_contig || rte_eal_iova_mode() == RTE_IOVA_VA) { > pg_sz = 0; > + pg_shift = 0; > align = RTE_CACHE_LINE_SIZE; So, assuming dpaa2 as example, I ran testpmd. IOVA=VA is the mode. pg_sz = 0 is set. same as before applying the hotplug patchset except that earlier this decision was purely based on availability of hugepages (rte_eal_has_hugepages()). Moving on... > + } else if (rte_eal_has_hugepages()) { > + pg_sz = get_min_page_size(); > + pg_shift = rte_bsf32(pg_sz); > + align = pg_sz; > } else { > pg_sz = getpagesize(); > pg_shift = rte_bsf32(pg_sz); > @@ -585,23 +643,34 @@ rte_mempool_populate_default(struct rte_mempool *mp) > goto fail; > } > > - mz = rte_memzone_reserve_aligned(mz_name, size, > - mp->socket_id, mz_flags, align); > - /* not enough memory, retry with the biggest zone we have */ > - if (mz == NULL) > - mz = rte_memzone_reserve_aligned(mz_name, 0, > + if (force_contig) { > + /* > + * if contiguous memory for entire mempool memory was > + * requested, don't try reserving again if we fail. > + */ > + mz = rte_memzone_reserve_aligned_contig(mz_name, size, > + mp->socket_id, mz_flags, align); > + } else { > + mz = rte_memzone_reserve_aligned(mz_name, size, > mp->socket_id, mz_flags, align); > + /* not enough memory, retry with the biggest zone we > + * have > + */ > + if (mz == NULL) > + mz = rte_memzone_reserve_aligned(mz_name, 0, > + mp->socket_id, mz_flags, align); > + } > if (mz == NULL) { > ret = -rte_errno; > goto fail; > } > > - if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG) > + if (no_contig) > iova = RTE_BAD_IOVA; > else > iova = mz->iova; > > - if (rte_eal_has_hugepages()) > + if (rte_eal_has_hugepages() && force_contig) So, pre-hotplugging patch, call used to enter mempool_populate_iova. But, with the 'force_contig' not set (in app/test-pmd/testpmd.c:521) while calling rte_pktmbuf_pool_create, rte_mempool_populate_va is called instead. > ret = rte_mempool_populate_iova(mp, mz->addr, > iova, mz->len, > rte_mempool_memchunk_mz_free, > -- > 2.7.4 This is called with pg_sz = 0: 678 else >># 679 ret = rte_mempool_populate_virt(mp, mz->addr, 680 mz->len, pg_sz, 681 rte_mempool_memchunk_mz_free, 682 (void *)(uintptr_t)mz); In this function, 512 /* address and len must be page-aligned */ 513 if (RTE_PTR_ALIGN_CEIL(addr, pg_sz) != addr) 514 return -EINVAL; This is where error is returned. I don't think RTE_PTR_ALIGN_CEIL is designed to handle pg_sz = 0. It is roughly equivalent to: RTE_PTR_ALIGN_FLOOR(((uintptr_t)addr - 1), pg_sz) which returns NULL (0 ~ pg_sz). Basically, this ends up failing rte_mempool_populate_default. I think the reason is the assumption that when rte_mempool_populate_virt is called, it can handle 0 page sizes (there would issues besides the above RTE_PTR_ALIGN_CEIL as well, like a for-loop looping over off+pg_sz), is wrong. It needs a valid page-size value to work with (!0). So, basically, DPAA2 is stuck with this patch because of above issue, if I am correctly comprehending it as above. Regards, Shreyansh