On Sun, 3 Sep 2017 20:15:13 +0200 Frederic Barrat <fbar...@linux.vnet.ibm.com> wrote:
> The PSL and nMMU need to see all TLB invalidations for the memory > contexts used on the adapter. For the hash memory model, it is done by > making all TLBIs global as soon as the cxl driver is in use. For > radix, we need something similar, but we can refine and only convert > to global the invalidations for contexts actually used by the device. > > The new mm_context_add_copro() API increments the 'active_cpus' count > for the contexts attached to the cxl adapter. As soon as there's more > than 1 active cpu, the TLBIs for the context become global. Active cpu > count must be decremented when detaching to restore locality if > possible and to avoid overflowing the counter. > > The hash memory model support is somewhat limited, as we can't > decrement the active cpus count when mm_context_remove_copro() is > called, because we can't flush the TLB for a mm on hash. So TLBIs > remain global on hash. Sorry I didn't look at this earlier and just wading in here a bit, but what do you think of using mmu notifiers for invalidating nMMU and coprocessor caches, rather than put the details into the host MMU management? npu-dma.c already looks to have almost everything covered with its notifiers (in that it wouldn't have to rely on tlbie coming from host MMU code). This change is not too bad today, but if we get to more complicated MMU/nMMU TLB management like directed invalidation of particular units, then putting more knowledge into the host code will end up being complex I think. I also want to also do optimizations on the core code that assumes we only have to take care of other CPUs, e.g., https://patchwork.ozlabs.org/patch/811068/ Or, another example, directed IPI invalidations from the mm_cpumask bitmap. I realize you want to get something merged! For the merge window and backports this seems fine. I think it would be nice soon afterwards to get nMMU knowledge out of the core code... Though I also realize with our tlbie instruction that does everything then it may be tricky to make a really optimal notifier. Thanks, Nick > > Signed-off-by: Frederic Barrat <fbar...@linux.vnet.ibm.com> > Fixes: f24be42aab37 ("cxl: Add psl9 specific code") > --- > Changelog: > v3: don't decrement active cpus count with hash, as we don't know how to flush > v2: Replace flush_tlb_mm() by the new flush_all_mm() to flush the TLBs > and PWCs (thanks to Ben)