Date: Wed, 27 Jan 2016 16:51:22 +0900 From: Ryota Ozaki <ozak...@netbsd.org>
Here it is: http://www.netbsd.org/~ozaki-r/softint-if_input-ifqueue.diff Results of performance measurements of it are also added to https://gist.github.com/ozaki-r/975b06216a54a084debc The results are good but bothers me; it achieves better performance than vanilla (and the 1st implementation) on high load (IP forwarding). For fast forward, it also beats the 1st one. I thought that holding splnet during ifp->if_input (splnet is needed for ifqueue operations and so keep holding in the patch) might affect the results. So I tried to release during ifp->if_input but the results didn't change so much (the result of IP forwarding is still better than vanilla). Anyone have any ideas? Here's a wild guess: with vanilla, each CPU does wm_rxeof loop iteration if_input processing wm_rxeof loop iteration if_input processing ... back and forth. With softint-rx-ifq, each CPU does wm_rxeof loop iteration wm_rxeof loop iteration ... if_input processing if_input processing ... because softint processing is blocked until the hardintr handler completes. So vanilla might make less efficient use of the CPU cache, and vanilla might leave the rxq full for longer so that the device cannot fill it as quickly with incoming packets. Another experiment that might be worthwhile is to bind the interrupt to a specific CPU, and then use splnet instead of WM_RX_LOCK to avoid acquiring and releasing a lock for each packet. (On Intel >=Haswell, we should use transactional memory to avoid bus traffic for that anyway (and maybe invent an MD pcq(9) that does the same). But the experiment with wm(4) is easier, and not everyone has transactional memory.)