Le 21/08/2017 à 19:35, Benjamin Herrenschmidt a écrit :
On Mon, 2017-08-21 at 19:27 +0200, Frederic Barrat wrote:
Hi Ben,
Le 24/07/2017 à 06:28, Benjamin Herrenschmidt a écrit :
Instead of comparing the whole CPU mask every time, let's
keep a counter of how many bits are set in the mask. Thus
testing for a local mm only requires testing if that counter
is 1 and the current CPU bit is set in the mask.
I'm trying to see if we could merge this patch with what I'm trying to
do to mark a context as requiring global TLBIs.
In http://patchwork.ozlabs.org/patch/796775/
I'm introducing a 'flags' per memory context, using one bit to say if
the context needs global TLBIs.
The 2 could co-exist, just checking... Do you think about using the
actual active_cpus count down the road, or is it just a matter of
knowing if there are more than one active cpus?
Or you could just incrementer my counter. Just make sure you increment
it at most once per CXL context and decrement when the context is gone.
The decrementing part is giving me troubles, and I think it makes sense:
if I decrement the counter when detaching the context from the capi
card, then the next TLBIs for the memory context may be back to local.
So when the process exits, the NPU wouldn't get the associated TLBIs,
which spells trouble the next time the same memory context ID is reused.
I believe this the cause of the problem I'm seeing. As soon as I keep
the TLBIs global, even after I detach from the capi adapter, everything
is fine.
Does it sound right?
So to keep the checks minimal in mm_is_thread_local(), to just checking
the active_cpus count, I'm thinking of introducing a "copro enabled" bit
on the context, so that we can increment active_cpus only once. And
never decrement it.
Fred