On Wed, Nov 18, 2015 at 10:01:53PM -0200, Marcelo Tosatti wrote: > On Wed, Nov 18, 2015 at 07:25:03PM +0100, Thomas Gleixner wrote: > > Folks! > > > > After rereading the mail flood on CAT and staring into the SDM for a > > while, I think we all should sit back and look at it from scratch > > again w/o our preconceptions - I certainly had to put my own away. > > > > Let's look at the properties of CAT again: > > > > - It's a per socket facility > > > > - CAT slots can be associated to external hardware. This > > association is per socket as well, so different sockets can have > > different behaviour. I missed that detail when staring the first > > time, thanks for the pointer! > > > > - The association ifself is per cpu. The COS selection happens on a > > CPU while the set of masks which are selected via COS are shared > > by all CPUs on a socket. > > > > There are restrictions which CAT imposes in terms of configurability: > > > > - The bits which select a cache partition need to be consecutive > > > > - The number of possible cache association masks is limited > > > > Let's look at the configurations (CDP omitted and size restricted) > > > > Default: 1 1 1 1 1 1 1 1 > > 1 1 1 1 1 1 1 1 > > 1 1 1 1 1 1 1 1 > > 1 1 1 1 1 1 1 1 > > > > Shared: 1 1 1 1 1 1 1 1 > > 0 0 1 1 1 1 1 1 > > 0 0 0 0 1 1 1 1 > > 0 0 0 0 0 0 1 1 > > > > Isolated: 1 1 1 1 0 0 0 0 > > 0 0 0 0 1 1 0 0 > > 0 0 0 0 0 0 1 0 > > 0 0 0 0 0 0 0 1 > > > > Or any combination thereof. Surely some combinations will not make any > > sense, but we really should not make any restrictions on the stupidity > > of a sysadmin. The worst outcome might be L3 disabled for everything, > > so what? > > > > Now that gets even more convoluted if CDP comes into play and we > > really need to look at CDP right now. We might end up with something > > which looks like this: > > > > 1 1 1 1 0 0 0 0 Code > > 1 1 1 1 0 0 0 0 Data > > 0 0 0 0 0 0 1 0 Code > > 0 0 0 0 1 1 0 0 Data > > 0 0 0 0 0 0 0 1 Code > > 0 0 0 0 1 1 0 0 Data > > or > > 0 0 0 0 0 0 0 1 Code > > 0 0 0 0 1 1 0 0 Data > > 0 0 0 0 0 0 0 1 Code > > 0 0 0 0 0 1 1 0 Data > > > > Let's look at partitioning itself. We have two options: > > > > 1) Per task partitioning > > > > 2) Per CPU partitioning > > > > So far we only talked about #1, but I think that #2 has a value as > > well. Let me give you a simple example. > > > > Assume that you have isolated a CPU and run your important task on > > it. You give that task a slice of cache. Now that task needs kernel > > services which run in kernel threads on that CPU. We really don't want > > to (and cannot) hunt down random kernel threads (think cpu bound > > worker threads, softirq threads ....) and give them another slice of > > cache. What we really want is: > > > > 1 1 1 1 0 0 0 0 <- Default cache > > 0 0 0 0 1 1 1 0 <- Cache for important task > > 0 0 0 0 0 0 0 1 <- Cache for CPU of important task > > > > It would even be sufficient for particular use cases to just associate > > a piece of cache to a given CPU and do not bother with tasks at all. > > > > We really need to make this as configurable as possible from userspace > > without imposing random restrictions to it. I played around with it on > > my new intel toy and the restriction to 16 COS ids (that's 8 with CDP > > enabled) makes it really useless if we force the ids to have the same > > meaning on all sockets and restrict it to per task partitioning. > > > > Even if next generation systems will have more COS ids available, > > there are not going to be enough to have a system wide consistent > > view unless we have COS ids > nr_cpus. > > > > Aside of that I don't think that a system wide consistent view is > > useful at all. > > > > - If a task migrates between sockets, it's going to suffer anyway. > > Real sensitive applications will simply pin tasks on a socket to > > avoid that in the first place. If we make the whole thing > > configurable enough then the sysadmin can set it up to support > > even the nonsensical case of identical cache partitions on all > > sockets and let tasks use the corresponding partitions when > > migrating. > > > > - The number of cache slices is going to be limited no matter what, > > so one still has to come up with a sensible partitioning scheme. > > > > - Even if we have enough cos ids the system wide view will not make > > the configuration problem any simpler as it remains per socket. > > > > It's hard. Policies are hard by definition, but this one is harder > > than most other policies due to the inherent limitations. > > > > So now to the interface part. Unfortunately we need to expose this > > very close to the hardware implementation as there are really no > > abstractions which allow us to express the various bitmap > > combinations. Any abstraction I tried to come up with renders that > > thing completely useless. > > No you don't.
Actually, there is a point that is useful: you might want the important application to share the L3 portion with HW (that HW DMAs into), and have only the application and the HW use that region. So its a good point that controlling the exact position of the reservation is important. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/