On Mon, Feb 08, 2021 at 08:35:31PM +0000, Song Bao Hua (Barry Song) wrote: > > > > From: Jason Gunthorpe [mailto:j...@ziepe.ca] > > Sent: Tuesday, February 9, 2021 7:34 AM > > To: David Hildenbrand <da...@redhat.com> > > Cc: Wangzhou (B) <wangzh...@hisilicon.com>; linux-kernel@vger.kernel.org; > > io...@lists.linux-foundation.org; linux...@kvack.org; > > linux-arm-ker...@lists.infradead.org; linux-...@vger.kernel.org; Andrew > > Morton <a...@linux-foundation.org>; Alexander Viro > > <v...@zeniv.linux.org.uk>; > > gre...@linuxfoundation.org; Song Bao Hua (Barry Song) > > <song.bao....@hisilicon.com>; kevin.t...@intel.com; > > jean-phili...@linaro.org; eric.au...@redhat.com; Liguozhu (Kenneth) > > <liguo...@hisilicon.com>; zhangfei....@linaro.org; chensihang (A) > > <chensiha...@hisilicon.com> > > Subject: Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory > > pin > > > > On Mon, Feb 08, 2021 at 09:14:28AM +0100, David Hildenbrand wrote: > > > > > People are constantly struggling with the effects of long term pinnings > > > under user space control, like we already have with vfio and RDMA. > > > > > > And here we are, adding yet another, easier way to mess with core MM in > > > the > > > same way. This feels like a step backwards to me. > > > > Yes, this seems like a very poor candidate to be a system call in this > > format. Much too narrow, poorly specified, and possibly security > > implications to allow any process whatsoever to pin memory. > > > > I keep encouraging people to explore a standard shared SVA interface > > that can cover all these topics (and no, uaccel is not that > > interface), that seems much more natural. > > > > I still haven't seen an explanation why DMA is so special here, > > migration and so forth jitter the CPU too, environments that care > > about jitter have to turn this stuff off. > > This paper has a good explanation: > https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7482091 > > mainly because page fault can go directly to the CPU and we have > many CPUs. But IO Page Faults go a different way, thus mean much > higher latency 3-80x slower than page fault: > events in hardware queue -> Interrupts -> cpu processing page fault > -> return events to iommu/device -> continue I/O.
The justifications for this was migration scenarios and migration is short. If you take a fault on what you are migrating only then does it slow down the CPU. Are you also working with HW where the IOMMU becomes invalidated after a migration and doesn't reload? ie not true SVA but the sort of emulated SVA we see in a lot of places? It would be much better to work improve that to have closer sync with the CPU page table than to use pinning. Jason