On Wed, 2016-05-11 at 06:08 -0700, Eric Dumazet wrote: > On Wed, 2016-05-11 at 11:48 +0200, Paolo Abeni wrote: > > Hi Eric, > > On Tue, 2016-05-10 at 15:51 -0700, Eric Dumazet wrote: > > > On Wed, 2016-05-11 at 00:32 +0200, Hannes Frederic Sowa wrote: > > > > > > > Not only did we want to present this solely as a bugfix but also as as > > > > performance enhancements in case of virtio (as you can see in the cover > > > > letter). Given that a long time ago there was a tendency to remove > > > > softirqs completely, we thought it might be very interesting, that a > > > > threaded napi in general seems to be absolutely viable nowadays and > > > > might offer new features. > > > > > > Well, you did not fix the bug, you worked around by adding yet another > > > layer, with another sysctl that admins or programs have to manage. > > > > > > If you have a special need for virtio, do not hide it behind a 'bug fix' > > > but add it as a features request. > > > > > > This ksoftirqd issue is real and a fix looks very reasonable. > > > > > > Please try this patch, as I had very good success with it. > > > > Thank you for your time and your effort. > > > > I tested your patch on the bare metal "single core" scenario, disabling > > the unneeded cores with: > > CPUS=`nproc` > > for I in `seq 1 $CPUS`; do echo 0 > > > /sys/devices/system/node/node0/cpu$I/online; done > > > > And I got a: > > > > [ 86.925249] Broke affinity for irq <num> > > > > Was it fatal, or simply a warning that you are removing the cpu that was > the only allowed cpu in an affinity_mask ?
The above message is emitted with pr_notice() by the x86 version of fixup_irqs(). It's not fatal, the host is alive and well after that. The un-patched kernel does not emit it on cpus disabling. I'll try to look into this later. > Looks another bug to fix then ? We disabled CPU hotplug here at Google > for our production, as it was notoriously buggy. No time to fix dozens > of issues added by a crowd of developers that do not even know a cpu can > be unplugged. > > Maybe some caller of local_bh_disable()/local_bh_enable() expected that > current softirq would be processed. Obviously flaky even before the > patches. > > > for each irq number generated by a network device. > > > > In this scenario, your patch solves the ksoftirqd issue, performing > > comparable to the napi threaded patches (with a negative delta in the > > noise range) and introducing a minor regression with a single flow, in > > the noise range (3%). > > > > As said in a previous mail, we actually experimented something similar, > > but it felt quite hackish. > > Right, we are networking guys, and we feel that messing with such core > infra is not for us. So we feel comfortable adding a pure networking > patch. > > > > > AFAICS this patch adds three more tests in the fast path and affect all > > other softirq use case. I'm not sure how to check for regression there. > > It is obvious to me that ksoftird mechanism is not working as intended. > > Fixing it might uncover bugs from parts of the kernel relying on the > bug, indirectly or directly. Is it a good thing ? > > I can not tell before trying. > > Just by looking at /proc/{ksoftirqs_pid}/sched you can see the problem, > as we normally schedule ksoftird under stress but most of the time, > the softirq items were processed by another tasks as you found out. > > > > > > The napi thread patches are actually a new feature, that also fixes the > > ksoftirqd issue: hunting the ksoftirqd issue has been the initial > > trigger for this work. I'm sorry for not being clear enough in the cover > > letter. > > > > The napi thread patches offer additional benefits, i.e. an additional > > relevant gain in the described test scenario, and do not impact on other > > subsystems/kernel entities. > > > > I still think they are worthy, and I bet you would disagree, but could > > you please articulate more which parts concern you most and/or are more > > bloated ? > > Just look at the added code. napi_threaded_poll() is very buggy, but > honestly I do not want to fix the bugs you added there. If you have only > one vcpu, how jiffies can ever change since you block BH ? Uh, we have likely the same issue in the net_rx_action() function, which also execute with bh disabled and check for jiffies changes even on single core hosts ?!? Aren't jiffies updated by the timer interrupt ? and thous even with bh_disabled ?!? > I was planning to remove cond_resched_softirq() that we no longer use > after my recent changes to TCP stack, > and you call it again (while it is obviously buggy since it does not > check if a BH is pending, only if a thread needs the cpu) I missed that, thank you for pointing out. > I prefer fixing the existing code, really. It took us years to > understand it and maybe fix it. > > Just think of what will happen if you have 10 devices (10 new threads in > your model) and one cpu. > > Instead of the nice existing netif_rx() doing 64 packets per device > rounds, you'll now rely on process scheduler behavior that has no such > granularity. > > Adding more threads is the natural answer of userland programmers, but > in the kernel it is not the right answer. We already have mechanism, > just use them and fix them if they are broken. > > Sorry, I really do not think your patches are the way to go. > But this thread is definitely interesting. Oh, this is a far better comment that I would have expected ;-) Cheers, Paolo