On Thu, Sep 07, 2017 at 07:54:09AM +0200, Thomas Gleixner wrote: > On Thu, 7 Sep 2017, Yu Chen wrote: > > On Wed, Sep 06, 2017 at 10:03:58AM +0200, Thomas Gleixner wrote: > > > Can you please apply the debug patch below, boot the machine and right > > > after login provide the output of > > > > > > # cat /sys/kernel/debug/tracing/trace > > > > > kworker/0:2-303 [000] .... 9.135467: msi_domain_alloc_irqs: dev: > > 0000:bb:00.0 nvec 1 virq 34 > > kworker/0:2-303 [000] .... 9.135476: msi_domain_alloc_irqs: dev: > > 0000:bb:00.0 nvec 1 virq 35 > > kworker/0:2-303 [000] .... 9.135484: msi_domain_alloc_irqs: dev: > > 0000:bb:00.0 nvec 1 virq 36 > > <SNIP> > > > kworker/0:2-303 [000] .... 9.762268: msi_domain_alloc_irqs: dev: > > 0000:bb:00.3 nvec 1 virq 331 > > kworker/0:2-303 [000] .... 9.762278: msi_domain_alloc_irqs: dev: > > 0000:bb:00.3 nvec 1 virq 332 > > kworker/0:2-303 [000] .... 9.762288: msi_domain_alloc_irqs: dev: > > 0000:bb:00.3 nvec 1 virq 333 > > That's 300 vectors. > > > bb:00.[0-3] Ethernet controller: Intel Corporation Device 37d0 (rev 03) > > > > -+-[0000:b2]-+-00.0-[b3-bc]----00.0-[b4-bc]--+-00.0-[b5-b6]----00.0 > > | | +-01.0-[b7-b8]----00.0 > > | | +-02.0-[b9-ba]----00.0 > > | | \-03.0-[bb-bc]--+-00.0 > > | | +-00.1 > > | | +-00.2 > > | | \-00.3 > > > > and they are using i40e driver, the vectors should be reserved by: > > i40e_probe() -> > > i40e_init_interrupt_scheme() -> > > i40e_init_msix() -> > > i40e_reserve_msix_vectors() -> > > pci_enable_msix_range() > > > > # ls /sys/kernel/debug/irq/irqs > > 0 10 11 13 142 184 217 259 292 31 33 > > 337 339 340 342 344 346 348 350 352 354 356 > > 358 360 362 364 366 368 370 372 374 376 378 > > 380 382 384 386 388 390 392 394 4 6 7 9 > > 1 109 12 14 15 2 24 26 3 32 335 > > 338 34 341 343 345 347 349 351 353 355 357 > > 359 361 363 365 367 369 371 373 375 377 379 > > 381 383 385 387 389 391 393 395 5 67 8 > > Out of these 300 interrupts exactly 8 randomly selected ones are actively > used. And the other 292 interrupts are just there because it might need > them in the future when the 32 CPU machine gets magically upgraded to 4096 > cores at runtime? > Humm, the 292 vectors remain disabled due to the network devices have not been enabled(say,ifconfig up does not get invoked), so request_irq() does not get invoked for these vectors? I have an impression that once I've borrowed some fiber cables to connect the platform, the active IRQ from i40e raised a lot, although I don't have these expensive cables now... > Can the i40e people @intel please fix this waste of resources and sanitize > their interrupt allocation scheme? > > Please switch it over to managed interrupts so the affinity spreading > happens in a sane way and the interrupts are properly managed on CPU > hotplug. Ok, I think currently in i40e driver the reservation of vectors leverages pci_enable_msix_range() and did not provide the affinity hit to low level IRQ system thus the managed interrupts is not enabled there(although later in i40e driver we use irq_set_affinity_hint() to spread the IRQs)
Thanks, Yu > > Thanks, > > tglx