Hi Jerin, Hi Vamsi, On Fri, Oct 25, 2019 at 09:20:20AM +0000, Vamsi Krishna Attunuru wrote: > > > > -----Original Message----- > > From: Jerin Jacob <jerinjac...@gmail.com> > > Sent: Friday, October 25, 2019 1:01 AM > > To: Olivier Matz <olivier.m...@6wind.com> > > Cc: Vamsi Krishna Attunuru <vattun...@marvell.com>; Andrew Rybchenko > > <arybche...@solarflare.com>; Ferruh Yigit <ferruh.yi...@intel.com>; > > tho...@monjalon.net; Jerin Jacob Kollanukkaran <jer...@marvell.com>; Kiran > > Kumar Kokkilagadda <kirankum...@marvell.com>; anatoly.bura...@intel.com; > > step...@networkplumber.org; dev@dpdk.org > > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option > > > > On Thu, Oct 24, 2019 at 11:05 PM Olivier Matz <olivier.m...@6wind.com> > > wrote: > > > > > > Hi, > > > > > > On Wed, Oct 23, 2019 at 08:32:08PM +0530, Jerin Jacob wrote: > > > > On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz <olivier.m...@6wind.com> > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote: > > > > > > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru > > > > > > <vattun...@marvell.com> wrote: > > > > > > > > > > > > > > Hi Ferruh, > > > > > > > > > > > > > > Can you please explain the problems in using kni dedicated mbuf > > > > > > > alloc > > routines while enabling kni iova=va mode. Please see the below discussion > > with > > Andrew. He wanted to know the problems in having newer APIs. > > > > > > > > > > > > > > > > > > While waiting for the Ferruh reply, I would like to summarise > > > > > > the current status > > > > > > > > > > > > # In order to make KNI work with IOVA as VA, We need to make > > > > > > sure mempool pool _object_ should not span across two huge pages > > > > > > > > > > > > # This problem can be fixed by, either of: > > > > > > > > > > > > a) Introduce a flag in mempool to define this constraint, so > > > > > > that, when only needed, this constraint enforced and this is in > > > > > > line with existing semantics of addressing such problems in > > > > > > mempool > > > > > > > > > > > > b) Instead of creating a flag, Make this behavior by default in > > > > > > mempool for IOVA as VA case > > > > > > > > > > > > Upside: > > > > > > b1) There is no need for specific mempool_create for KNI. > > > > > > > > > > > > Downside: > > > > > > b2) Not align with existing mempool API semantics > > > > > > b3) There will be a trivial amount of memory waste as we can not > > > > > > allocate from the edge. Considering the normal huge page memory > > > > > > size is 1G or 512MB this not a real issue. > > > > > > > > > > > > c) Make IOVA as PA when KNI kernel module is loaded > > > > > > > > > > > > Upside: > > > > > > c1) Doing option (a) would call for new KNI specific mempool > > > > > > create API i.e existing KNI applications need a one-line change > > > > > > in application to make it work with release 19.11 or later. > > > > > > > > > > > > Downslide: > > > > > > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not work > > > > > > with KNI > > > > > > c3) Need root privilege to run KNI as IOVA as PA need root > > > > > > privilege > > > > > > > > > > > > For the next year, we expect applications to work 19.11 without > > > > > > any code change. My personal opinion to make go with option (a) > > > > > > and update the release notes to document the change any it > > > > > > simple one-line change. > > > > > > > > > > > > The selection of (a) vs (b) is between KNI and Mempool maintainers. > > > > > > Could we please reach a consensus? Or can we discuss this TB > > > > > > meeting? > > > > > > > > > > > > We are going back and forth on this feature on for the last 3 > > > > > > releases. Now that, we solved all the technical problems, please > > > > > > help us to decide (a) vs (b) to make forward progress. > > > > > > > > > > Thank you for the summary. > > > > > What is not clear to me is if (a) or (b) may break an existing > > > > > application, and if yes, in which case. > > > > > > > > Thanks for the reply. > > > > > > > > To be clear we are talking about out of tree KNI tree application. > > > > Which they don't want to > > > > change rte_pktmbuf_pool_create() to rte_kni_pktmbuf_pool_create() > > > > and build for v19.11 > > > > > > > > So in case (b) there is no issue as It will be using > > > > rte_pktmbuf_pool_create (). > > > > But in case of (a) it will create an issue if out of tree KNI > > > > application is using rte_pktmbuf_pool_create() which is not using > > > > the NEW flag. > > > > > > Following yesterday's discussion at techboard, I looked at the mempool > > > code and at my previous RFC patch. It took some time to remind me what > > > was my worries. > > > > Thanks for the review Olivier. > > > > Just to make sure the correct one is reviewed. > > > > 1) v7 had similar issue mentioned > > https://urldefense.proofpoint.com/v2/url?u=http- > > 3A__patches.dpdk.org_patch_56585_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtf > > Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YMVHe > > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=mfN_afnyFm65sQYzaAg_- > > uM9o22A5j392TdBZY-bKK4&e=
The v7 has the problem I described below: the iova-contiguous allocation may fail because the calculated size is too small, and remaining objects will be added in another chunk. This can happen if a fully iova-contiguous mempool is requested (it happens for octeontx). Also, the way page size is retrieved in rte_mempool_op_populate_default() assume that memzones are used, which is not correct. > > 2) v11 addressed the review comments and you have given the Acked-by for > > mempool change https://urldefense.proofpoint.com/v2/url?u=http- > > 3A__patches.dpdk.org_patch_61559_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtf > > Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YMVHe > > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=frFvKOHFDRhTam6jDZZc6omK2gb1RU62 > > xzAiiBMnf0I&e= The v11 looked fine to me, because it does not impact any default behavior, and the plan was to remove this new function when my mempool patchset was in. > > > > My thought process in the TB meeting was, since > > rte_mempool_populate_from_pg_sz_chunks() reviwed replace > > rte_pktmbuf_pool_create's rte_mempool_populate_default() with > > rte_mempool_populate_from_pg_sz_chunks() > > in IOVA == VA case to avoid a new KNI mempool_create API. > > > > > > > > Currently, in rte_mempool_populate_default(), when the mempool is > > > populated, we first try to allocate one iova-contiguous block of (n * > > > elt_size). On success, we use this memory to fully populate the > > > mempool without taking care of crossing page boundaries. > > > > > > If we change the behavior to prevent objects from crossing pages, the > > > assumption that allocating (n * elt_size) is always enough becomes > > > wrong. By luck, there is no real impact, because if the mempool is > > > not fully populated after this first iteration, it will allocate a new > > > chunk. > > > > > > To be rigorous, we need to better calculate the amount of memory to > > > allocate, according to page size. > > Hi Olivier, > > Thanks for the review, I think the below mentioned problems exist with > current mempool_populate_default() api and will there be high chances of > hitting > those problems when we precalculate the memory size(after preventing objs > from > pg boundary and fit complete mempool memory in single mem chunk) and if > mempool size goes beyond page size as below example. ?, Yes, the problem described below (alloc a mempool of 1.1GB resulting in 2GB reserved) exists in the current version. It will be fixed in the new version of my "mempool: avoid objects allocations across pages" patchset. FYI, a reworked patchset is alsmost ready, I'll send it monday. > > Regards, > Vamsi > > > > > > > Looking at the code, I found another problem in the same area: let's > > > say we populate a mempool that requires 1.1GB (and we use 1G huge pages): > > > > > > 1/ mempool code will first tries to allocate an iova-contiguous zone > > > of 1.1G -> fail > > > 2/ it then tries to allocate a page-aligned non iova-contiguous > > > zone of 1.1G, which is 2G. On success, a lot of memory is wasted. > > > 3/ on error, we try to allocate the biggest zone, it can still return > > > a zone between 1.1G and 2G, which can also waste memory. > > > > > > I will rework my mempool patchset to properly address these issues, > > > hopefully tomorrow. > > > > OK. > > > > > > > > > > Also, I thought about another idea to solve your issue, not sure it is > > > better but it would not imply to change the mempool behavior. If I > > > understood the problem, when a mbuf is accross 2 pages, the copy of > > > the data can fail in kni because the mbuf is not virtually contiguous > > > in the > > > > For KNI use case, we would need _physically_ contiguous to make sure that > > using, get_user_pages_remote() we get physically contiguous memory map, so > > that both KNI kernel thread and KNI kernel context and DPDK userspace can > > use > > the same memory in different contexts. > > > > > > > > > kernel. So why not in this case splitting the memcpy() into several, > > > each of them being on a single page (and calling phys2virt() for each > > > page)? The same would have to be done when accessing the fields of the > > > mbuf structure if it crosses a page boundary. Would that work? This > > > > If the above is the requirement, Does this logic need to be in slow path or > > fast > > path? In fast path. But I don't think the performance impact would be significative. Vamsi, do you confirm this approach could also solve the issue without changing the mempool? > > > > > could be a B plan. > > > > OK. > > > > > > > > Olivier