On Thu, Sep 6, 2018 at 5:54 PM, Roland Scheidegger <srol...@vmware.com> wrote: > Am 06.09.2018 um 22:56 schrieb Axel Davy: >> Yeah by pinning to cores, I meant to group of cores. >> >> I think a reasonable policy would be for the kernel to put all threads >> of a given process on the same L3 >> as long as the number of threads is lower than the L3 group size. >> When there is more threads I guess it'd need heuristics to pick which >> threads to put together. >> >> I fear if we begin to do the work manually, there won't be interest to >> do that in the kernel, >> and thus all applications will need to include such core pinning code to >> have good performance when >> multithreaded. > > I think the problem here is also that not all cores are equal. Depending > what your threads do, it might be preferable to keep your 8 threads on 4 > cores (as there's 8 logical cores) sharing the same L3 - but if they are > just independent threads doing heavy math it might well be preferable to > spread them out to a different CCX (at least as long as there aren't any > more threads which also need to run simultaneously). > And then you also have things like cpus with physical numa topologies > where only some cores have access to local memory and so on, which also > should influence placement of threads. > I have no idea if the kernel does something reasonable here, but I don't > think it will be able to find a (near) optimal solution without at least > some help from userspace.
The kernel puts Mesa threads on different CCXs 95% of time. It kinda makes sense for independent workloads, but not when reference counting is involved, because atomics are really really REALLY slow between CCXs. Like that patch for pipe_reference where removing p_atomic_read in the asserts increased performance by 40% for radeonsi. You can't make this up. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev