Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option

Olivier Matz Mon, 28 Oct 2019 07:06:23 -0700

On Sat, Oct 26, 2019 at 02:09:51PM +0000, Vamsi Krishna Attunuru wrote:
> Hi Olivier,
> 
> > -----Original Message-----
> > From: Olivier Matz <olivier.m...@6wind.com>
> > Sent: Saturday, October 26, 2019 5:55 PM
> > To: Vamsi Krishna Attunuru <vattun...@marvell.com>
> > Cc: Jerin Jacob <jerinjac...@gmail.com>; Andrew Rybchenko
> > <arybche...@solarflare.com>; Ferruh Yigit <ferruh.yi...@intel.com>;
> > tho...@monjalon.net; Jerin Jacob Kollanukkaran <jer...@marvell.com>;
> > Kiran Kumar Kokkilagadda <kirankum...@marvell.com>;
> > anatoly.bura...@intel.com; step...@networkplumber.org; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option
> > 
> > Hi Jerin, Hi Vamsi,
> > 
> > On Fri, Oct 25, 2019 at 09:20:20AM +0000, Vamsi Krishna Attunuru wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjac...@gmail.com>
> > > > Sent: Friday, October 25, 2019 1:01 AM
> > > > To: Olivier Matz <olivier.m...@6wind.com>
> > > > Cc: Vamsi Krishna Attunuru <vattun...@marvell.com>; Andrew
> > Rybchenko
> > > > <arybche...@solarflare.com>; Ferruh Yigit <ferruh.yi...@intel.com>;
> > > > tho...@monjalon.net; Jerin Jacob Kollanukkaran <jer...@marvell.com>;
> > > > Kiran Kumar Kokkilagadda <kirankum...@marvell.com>;
> > > > anatoly.bura...@intel.com; step...@networkplumber.org;
> > dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy
> > > > kni option
> > > >
> > > > On Thu, Oct 24, 2019 at 11:05 PM Olivier Matz
> > > > <olivier.m...@6wind.com>
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Wed, Oct 23, 2019 at 08:32:08PM +0530, Jerin Jacob wrote:
> > > > > > On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz
> > > > > > <olivier.m...@6wind.com>
> > > > wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote:
> > > > > > > > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru
> > > > > > > > <vattun...@marvell.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Ferruh,
> > > > > > > > >
> > > > > > > > > Can you please explain the problems in using kni dedicated
> > > > > > > > > mbuf alloc
> > > > routines while enabling kni iova=va mode. Please see the below
> > > > discussion with Andrew. He wanted to know the problems in having
> > newer APIs.
> > > > > > > >
> > > > > > > >
> > > > > > > > While waiting  for the Ferruh reply, I would like to
> > > > > > > > summarise the current status
> > > > > > > >
> > > > > > > > # In order to make KNI work with IOVA as VA, We need to make
> > > > > > > > sure mempool pool _object_ should not span across two huge
> > > > > > > > pages
> > > > > > > >
> > > > > > > > # This problem can be fixed by, either of:
> > > > > > > >
> > > > > > > > a) Introduce a flag in mempool to define this constraint, so
> > > > > > > > that, when only needed, this constraint enforced and this is
> > > > > > > > in line with existing semantics of addressing such problems
> > > > > > > > in mempool
> > > > > > > >
> > > > > > > > b) Instead of creating a flag, Make this behavior by default
> > > > > > > > in mempool for IOVA as VA case
> > > > > > > >
> > > > > > > > Upside:
> > > > > > > > b1) There is no need for specific mempool_create for KNI.
> > > > > > > >
> > > > > > > > Downside:
> > > > > > > > b2) Not align with existing mempool API semantics
> > > > > > > > b3) There will be a trivial amount of memory waste as we can
> > > > > > > > not allocate from the edge. Considering the normal huge page
> > > > > > > > memory size is 1G or 512MB this not a real issue.
> > > > > > > >
> > > > > > > > c) Make IOVA as PA when KNI kernel module is loaded
> > > > > > > >
> > > > > > > > Upside:
> > > > > > > > c1) Doing option (a) would call for new KNI specific mempool
> > > > > > > > create API i.e existing KNI applications need a one-line
> > > > > > > > change in application to make it work with release 19.11 or 
> > > > > > > > later.
> > > > > > > >
> > > > > > > > Downslide:
> > > > > > > > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not
> > > > > > > > work with KNI
> > > > > > > > c3) Need root privilege to run KNI as IOVA as PA need root
> > > > > > > > privilege
> > > > > > > >
> > > > > > > > For the next year, we expect applications to work 19.11
> > > > > > > > without any code change. My personal opinion to make go with
> > > > > > > > option (a) and update the release notes to document the
> > > > > > > > change any it simple one-line change.
> > > > > > > >
> > > > > > > > The selection of (a) vs (b) is between KNI and Mempool
> > maintainers.
> > > > > > > > Could we please reach a consensus? Or can we discuss this TB
> > meeting?
> > > > > > > >
> > > > > > > > We are going back and forth on this feature on for the last
> > > > > > > > 3 releases. Now that, we solved all the technical problems,
> > > > > > > > please help us to decide (a) vs (b) to make forward progress.
> > > > > > >
> > > > > > > Thank you for the summary.
> > > > > > > What is not clear to me is if (a) or (b) may break an existing
> > > > > > > application, and if yes, in which case.
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > To be clear we are talking about out of tree KNI tree application.
> > > > > > Which they don't want to
> > > > > > change rte_pktmbuf_pool_create() to
> > > > > > rte_kni_pktmbuf_pool_create() and build for v19.11
> > > > > >
> > > > > > So in case (b) there is no issue as It will be using
> > rte_pktmbuf_pool_create ().
> > > > > > But in case of (a) it will create an issue if out of tree KNI
> > > > > > application is using rte_pktmbuf_pool_create() which is not
> > > > > > using the NEW flag.
> > > > >
> > > > > Following yesterday's discussion at techboard, I looked at the
> > > > > mempool code and at my previous RFC patch. It took some time to
> > > > > remind me what was my worries.
> > > >
> > > > Thanks for the review Olivier.
> > > >
> > > > Just to make sure the correct one is reviewed.
> > > >
> > > > 1) v7 had similar issue mentioned
> > > > https://urldefense.proofpoint.com/v2/url?u=http-
> > > >
> > 3A__patches.dpdk.org_patch_56585_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7
> > xtf
> > > >
> > Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YM
> > VHe
> > > > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=mfN_afnyFm65sQYzaAg_-
> > > > uM9o22A5j392TdBZY-bKK4&e=
> > 
> > The v7 has the problem I described below: the iova-contiguous allocation
> > may fail because the calculated size is too small, and remaining objects 
> > will
> > be added in another chunk. This can happen if a fully iova-contiguous
> > mempool is requested (it happens for octeontx).
> > 
> > Also, the way page size is retrieved in
> > rte_mempool_op_populate_default() assume that memzones are used,
> > which is not correct.
> > 
> > > > 2) v11 addressed the review comments and you have given the Acked-by
> > > > for mempool change https://urldefense.proofpoint.com/v2/url?u=http-
> > > >
> > 3A__patches.dpdk.org_patch_61559_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7
> > xtf
> > > >
> > Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YM
> > VHe
> > > >
> > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=frFvKOHFDRhTam6jDZZc6omK2gb1RU
> > 62
> > > > xzAiiBMnf0I&e=
> > 
> > The v11 looked fine to me, because it does not impact any default behavior,
> > and the plan was to remove this new function when my mempool patchset
> > was in.
> > 
> > 
> > > >
> > > > My thought process in the TB meeting was, since
> > > > rte_mempool_populate_from_pg_sz_chunks() reviwed replace
> > > > rte_pktmbuf_pool_create's  rte_mempool_populate_default() with
> > > > rte_mempool_populate_from_pg_sz_chunks()
> > > > in IOVA == VA case to avoid a new KNI mempool_create API.
> > > >
> > > > >
> > > > > Currently, in rte_mempool_populate_default(), when the mempool is
> > > > > populated, we first try to allocate one iova-contiguous block of
> > > > > (n * elt_size). On success, we use this memory to fully populate
> > > > > the mempool without taking care of crossing page boundaries.
> > > > >
> > > > > If we change the behavior to prevent objects from crossing pages,
> > > > > the assumption that allocating (n * elt_size) is always enough
> > > > > becomes wrong.  By luck, there is no real impact, because if the
> > > > > mempool is not fully populated after this first iteration, it will
> > > > > allocate a new chunk.
> > > > >
> > > > > To be rigorous, we need to better calculate the amount of memory
> > > > > to allocate, according to page size.
> > >
> > > Hi Olivier,
> > >
> > > Thanks for the review, I think the below mentioned problems exist with
> > > current mempool_populate_default() api and will there be high chances
> > > of hitting those problems when we precalculate the memory size(after
> > > preventing objs from pg boundary and fit complete mempool memory in
> > > single mem chunk) and if mempool size goes beyond page size as below
> > > example. ?,
> > 
> > Yes, the problem described below (alloc a mempool of 1.1GB resulting in 2GB
> > reserved) exists in the current version. It will be fixed in the new 
> > version of
> > my "mempool: avoid objects allocations across pages"
> > patchset.
> > 
> > FYI, a reworked patchset is alsmost ready, I'll send it monday.
> 
> Thanks a lot Olivier.
> 
> > 
> > >
> > > Regards,
> > > Vamsi
> > >
> > > > >
> > > > > Looking at the code, I found another problem in the same area:
> > > > > let's say we populate a mempool that requires 1.1GB (and we use 1G
> > huge pages):
> > > > >
> > > > > 1/ mempool code will first tries to allocate an iova-contiguous zone
> > > > >    of 1.1G -> fail
> > > > > 2/ it then tries to allocate a page-aligned non iova-contiguous
> > > > >    zone of 1.1G, which is 2G. On success, a lot of memory is wasted.
> > > > > 3/ on error, we try to allocate the biggest zone, it can still return
> > > > >    a zone between 1.1G and 2G, which can also waste memory.
> > > > >
> > > > > I will rework my mempool patchset to properly address these
> > > > > issues, hopefully tomorrow.
> > > >
> > > > OK.
> > > >
> > > >
> > > > >
> > > > > Also, I thought about another idea to solve your issue, not sure
> > > > > it is better but it would not imply to change the mempool
> > > > > behavior. If I understood the problem, when a mbuf is accross 2
> > > > > pages, the copy of the data can fail in kni because the mbuf is
> > > > > not virtually contiguous in the
> > > >
> > > > For KNI use case, we would need _physically_ contiguous to make sure
> > > > that using, get_user_pages_remote() we get  physically contiguous
> > > > memory map, so that both KNI kernel thread and KNI kernel context
> > > > and DPDK userspace can use the same memory in different contexts.
> > > >
> > > >
> > > >
> > > > > kernel. So why not in this case splitting the memcpy() into
> > > > > several, each of them being on a single page (and calling
> > > > > phys2virt() for each page)? The same would have to be done when
> > > > > accessing the fields of the mbuf structure if it crosses a page
> > > > > boundary. Would that work? This
> > > >
> > > > If the above is the requirement, Does this logic need to be in slow
> > > > path or fast path?
> > 
> > In fast path. But I don't think the performance impact would be 
> > significative.
> > 
> > Vamsi, do you confirm this approach could also solve the issue without
> > changing the mempool?
> 
> So far there is no performance impact observed when these feature(kni iova as 
> va) is enabled with this patch set. Not sure of impact if when memcpy split 
> and extra address translations are introduced, this approach might solve but 
> I feel it's not a clearer way of solving the issue.


Agree it's not the clearer way, but maybe less risky.

As you can see, the reworked patchset is not that short:
http://inbox.dpdk.org/dev/20190719133845.32432-1-olivier.m...@6wind.com/T/#m6249311baae8469f5727f07c32e4e6e6844eae6a


> > 
> > > >
> > > > > could be a B plan.
> > > >
> > > > OK.
> > > >
> > > > >
> > > > > Olivier

Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option

Reply via email to