> Due to some device driver issues, I built this iteration of > the patch vs. 2.6.12.3. > > (Sorry about the attachment, but KMail is still word wrapping inserted > files.) > > Background: > > Here's a patch that builds on Natalie Protasevich's IRQ > compression patch and tries to work for MPS boots as well as > ACPI. It is meant for a 4-node IBM x460 NUMA box, which was > dying because it had interrupt pins with GSI numbers > > NR_IRQS and thus overflowed irq_desc. > > The problem is that this system has 280 GSIs (which are 1:1 > mapped with I/O APIC RTEs) and an 8-node box would have 560. > This is much bigger than NR_IRQS (224 for both i386 and > x86_64). Also, there aren't enough vectors to go around. > There are about 190 usable vectors, not counting the reserved > ones and the unused vectors at 0x20 to 0x2F. So, my patch > attempts to compress the GSI range and share vectors by sharing IRQs. > Hi James, I tested your patch today (sorry it took a while, was out of town), and in general it worked just fine. It was a small system with 3 IO-APICs, will hopefully try it on a large partition with 64 of them tonight. One thing I noticed: I think the patch is going for shared vectors way before exhausting available NR_IRQS, so I suggest a small modification to it, in gsi_irq_sharing(): int gsi_irq_sharing(int gsi) { int i, irq, vector;
BUG_ON(gsi >= NR_IRQ_VECTORS); if (platform_legacy_irq(gsi)) { gsi_2_irq[gsi] = gsi; return gsi; } if (gsi_2_irq[gsi] != 0xFF) return (int)gsi_2_irq[gsi]; vector = assign_irq_vector(gsi); // this part here========== if (gsi < 16) { irq = gsi; gsi_2_irq[gsi] = irq; } else { irq = next_irq++; gsi_2_irq[gsi] = irq; } //==================== IO_APIC_VECTOR(irq) = vector; printk(KERN_INFO "GSI %d assigned vector 0x%02X and IRQ %d\n", gsi, vector, irq); return irq; } (I took out the vector sharing part for clarity, just to concentrate on compression, and I didn't do any boundary checks). The (gsi<16) takes care of the recent problem with my ACPI IRQ compression patch breaking VIA chipset that doesn't tolerate PCI IRQ numbers above 15. I think this way we are saving more IRQs and place them denser. Here is back-to-back comparison of IRQ distribution with the original and modified patch: Original: CPU0 CPU1 CPU2 CPU3 0: 18758 20011 20008 28294 IO-APIC-edge timer 1: 97 18 79 16 IO-APIC-edge i8042 2: 0 0 0 0 XT-PIC cascade 8: 1 0 0 1 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-edge acpi 12: 0 708 0 110 IO-APIC-edge i8042 15: 4 0 0 39 IO-APIC-edge ide1 16: 0 0 0 0 IO-APIC-level uhci_hcd:usb1, uhci_hcd:usb4 17: 0 0 0 3 IO-APIC-level ohci1394 18: 670 2253 836 1981 IO-APIC-level libata, uhci_hcd:usb3 19: 0 0 0 0 IO-APIC-level uhci_hcd:usb2 23: 0 0 0 0 IO-APIC-level ehci_hcd:usb5 48: 212 0 0 4 IO-APIC-level eth0 <== gap on the 3nd io-apic NMI: 117 71 73 51 LOC: 87020 86997 86975 86952 ERR: 3 MIS: 0 <7>IRQ to pin mappings: <7>IRQ0 -> 0:2 <7>IRQ1 -> 0:1 <7>IRQ3 -> 0:3 <7>IRQ4 -> 0:4 <7>IRQ5 -> 0:5 <7>IRQ6 -> 0:6 <7>IRQ7 -> 0:7 <7>IRQ8 -> 0:8 <7>IRQ9 -> 0:9 <7>IRQ10 -> 0:10 <7>IRQ11 -> 0:11 <7>IRQ12 -> 0:12 <7>IRQ14 -> 0:14 <7>IRQ15 -> 0:15 <7>IRQ16 -> 0:16 <7>IRQ17 -> 0:17 <7>IRQ18 -> 0:18 <7>IRQ19 -> 0:19 <7>IRQ20 -> 0:20 <7>IRQ23 -> 0:23 <7>IRQ26 -> 1:2 <=======jump on the 2nd io-apic <7>IRQ27 -> 1:3 <7>IRQ28 -> 1:4 <7>IRQ29 -> 1:5 <7>IRQ48 -> 2:0 <=======jump on the 3rd io-apic <7>IRQ49 -> 2:1 <7>IRQ50 -> 2:2 <7>IRQ51 -> 2:3 <7>IRQ52 -> 2:4 <7>IRQ53 -> 2:5 <7>IRQ54 -> 2:6 <7>IRQ55 -> 2:7 <7>IRQ56 -> 2:8 Modified: CPU0 CPU1 CPU2 CPU3 0: 15125 17509 17507 25592 IO-APIC-edge timer 1: 187 66 280 140 IO-APIC-edge i8042 2: 0 0 0 0 XT-PIC cascade 8: 1 0 0 1 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-edge acpi 12: 0 0 0 110 IO-APIC-edge i8042 15: 4 0 0 39 IO-APIC-edge ide1 16: 0 0 0 0 IO-APIC-level uhci_hcd:usb1, uhci_hcd:usb4 17: 0 0 0 2 IO-APIC-level ohci1394 18: 753 2070 925 2035 IO-APIC-level libata, uhci_hcd:usb3 19: 0 0 0 0 IO-APIC-level uhci_hcd:usb2 21: 0 0 0 0 IO-APIC-level ehci_hcd:usb5 26: 164 0 0 4 IO-APIC-level eth0 NMI: 117 72 73 52 LOC: 75682 75659 75638 75615 ERR: 3 MIS: 0 <7>IRQ to pin mappings: <7>IRQ0 -> 0:2 <7>IRQ1 -> 0:1 <7>IRQ3 -> 0:3 <7>IRQ4 -> 0:4 <7>IRQ5 -> 0:5 <7>IRQ6 -> 0:6 <7>IRQ7 -> 0:7 <7>IRQ8 -> 0:8 <7>IRQ9 -> 0:9 <7>IRQ10 -> 0:10 <7>IRQ11 -> 0:11 <7>IRQ12 -> 0:12 <7>IRQ14 -> 0:14 <7>IRQ15 -> 0:15 <7>IRQ16 -> 0:16 <7>IRQ17 -> 0:17 <7>IRQ18 -> 0:18 <7>IRQ19 -> 0:19 <7>IRQ20 -> 0:20 <7>IRQ21 -> 0:23 <7>IRQ22 -> 1:2 <7>IRQ23 -> 1:3 <7>IRQ24 -> 1:4 <7>IRQ25 -> 1:5 <7>IRQ26 -> 2:0 <7>IRQ27 -> 2:1 <7>IRQ28 -> 2:2 <7>IRQ29 -> 2:3 <7>IRQ30 -> 2:4 <7>IRQ31 -> 2:5 <7>IRQ32 -> 2:6 <7>IRQ33 -> 2:7 <7>IRQ34 -> 2:8 Unfortunately, I cannot test the vector sharing part properly, since on our systems we are just about to use up all 224 interrupts, but not quiet. I have to mention that as far as I know Zwane is about to release his vector sharing mechanism, he had it implemented and working for i386 (I tested it on ES7000 successfully, by itself and combined with compression patch too), and was planning implementing it for x86_64. I am officially volunteering for testing it in its present state, for both i386 and x86_64 (I can still do this on our systems by removing the IRQ compression code :), hope this will help Zwane and Andi to release it as soon as possible. Regards, --Natalie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/