On Wed, Sep 13, 2017 at 11:29:02PM +0200, Thomas Gleixner wrote: > Sorry for the large CC list, but this is a major surgery. > > The vector management in x86 including the surrounding code is a > conglomorate of ancient bits and pieces which have been subject to > 'modernization' and featuritis over the years. The most obscure parts are > the vector allocation mechanics, the cleanup vector handling and the cpu > hotplug machinery. Replacing these pieces of art was on my todo list for a > long time. > > Recent attempts to 'solve' CPU offline / hibernation issues which are > partially caused by the current vector management implementation made me > look for real. Further information in this thread: > > http://lkml.kernel.org/r/cover.1504235838.git.yu.c.c...@intel.com > > Aside of drivers allocating gazillion of interrupts, there are quite some > things which can be addressed in the x86 vector management and in the core > code. > > - Multi CPU affinities: > > A dubious property which is not available on all machines and causes > major complexity both in the allocator and the cleanup/hotplug > management. See: > > http://lkml.kernel.org/r/alpine.DEB.2.20.1709071045440.1827@nanos > > - Priority level spreading: > > An obscure and undocumented property which I think is sufficiently > argued to be not required in: > > http://lkml.kernel.org/r/alpine.DEB.2.20.1709071045440.1827@nanos > > - Allocation of vectors when interrupt descriptors are allocated. > > This is a historical implementation detail, which is not really > required when the vector allocation is delayed up to the point when > request_irq() is invoked. This might make request_irq() fail, when the > vector space is exhausted, but drivers should handle request_irq() > fails anyway. > > The upside of changing this is that the active vector space becomes > smaller especially on hibernation/cpu offline when drivers shut down > queue interrupts of outgoing CPUs. > > Some of this is already addressed with the managed interrupt facility, > but that was bolted on top of the existing vector management because > proper integration was not possible at that point. I take the blame > for this, but the tradeoff of not doing it would have been more > broken driver boiler plate code all over the place. So I went for the > lesser of two evils. > > - Allocation of vectors on the wrong place > > Even for managed interrupts the vector allocation at descriptor > allocation happens on the wrong place and gets fixed after the fact > with a call to set_affinity(). In case of not remapped interrupts > this results in at least one interrupt on the wrong CPU before it is > migrated to the desired target. > > - Lack of instrumentation > > All of this is a black box which allows no insight into the actual > vector usage. > > The series addresses these points and converts the x86 vector management to > a bitmap based allocator which provides proper reservation management for > 'managed interrupts' and best effort reservation for regular interrupts. > The latter allows overcommitment, which 'fixes' some of hotplug/hibernation > problems in a clean way. It can't fix all of them depending on the driver > involved. > > This rework is no excuse for driver writers to do exhaustive vector > allocations instead of utilizing the managed interrupt infrastructure, but > it addresses long standing issues in this code with the side effect of > mitigating some of the driver oddities. The proper solution for multi queue > management are 'managed interrupts' which has been proven in the block-mq > work as they solve issues which are worked around in other drivers in > creative ways with lots of copied code and often enough broken attempts to > handle interrupt affinity and CPU hotplug problems. > > The new bitmap allocator and the x86 vector management code are > instrumented with tracepoints and the irq domain debugfs files allow deep > insight into the vector allocation and reservations. > > The patches work on machines with and without interrupt remapping and > inside of KVM guests of various flavours, though I have no idea what I > broke on the way with other hypervisors, posted interrupts etc. So I kindly > ask for your support in testing and review. > > The series applies on top of Linus tree and is available as git branch: > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/apic > > Note, that this branch is Linus tree plus scheduler and x86 fixes which I > required to do proper testing. They have outstanding pull requests and > might be merged already when you read this. > > Thanks, > > tglx > --- Tested on top of: commit e1b476ae32fcfa59fc6752b4b01988e759269dc3 Author: Thomas Gleixner <t...@linutronix.de> Date: Thu Sep 14 09:53:10 2017 +0200
x86/vector: Exclude IRQ0 from reservation mode from branch WIP.x86/apic, on a platform with 16 cores, bootup okay, cpu[1-31] offline/online okay. Before offline: name: VECTOR size: 0 mapped: 484 flags: 0x00000041 Online bitmaps: 32 Global available: 6419 Global reserved: 407 Total allocated: 77 System: 41: 0-19,32,50,128,238-255 | CPU | avl | man | act | vectors 0 126 0 77 33-49,51-110 1 203 0 0 2 203 0 0 3 203 0 0 4 203 0 0 5 203 0 0 6 203 0 0 7 203 0 0 8 203 0 0 9 203 0 0 10 203 0 0 11 203 0 0 12 203 0 0 13 203 0 0 14 203 0 0 15 203 0 0 16 203 0 0 17 203 0 0 18 203 0 0 19 203 0 0 20 203 0 0 21 203 0 0 22 203 0 0 23 203 0 0 24 203 0 0 25 203 0 0 26 203 0 0 27 203 0 0 28 203 0 0 29 203 0 0 30 203 0 0 31 203 0 0 After offline: name: VECTOR size: 0 mapped: 484 flags: 0x00000041 Online bitmaps: 1 Global available: 126 Global reserved: 407 Total allocated: 77 System: 41: 0-19,32,50,128,238-255 | CPU | avl | man | act | vectors 0 126 0 77 33-49,51-110 Thanks, Yu