> Is the issue gone if you reduce VHOST_RX_BATCH to 1? And it would be > also helpful to collect perf diff to see if anything interesting. > (Consider 4.4 shows more obvious regression, please use 4.4). >
Issue still exists when I force VHOST_RX_BATCH = 1 Collected perf data, with 4.12 as the baseline, 4.13 as delta1 and 4.13+VHOST_RX_BATCH=1 as delta2. All guests running 4.4. Same scenario, 2 uperf client guests, 2 uperf slave guests - I collected perf data against 1 uperf client process and 1 uperf slave process. Here are the significant diffs: uperf client: 75.09% +9.32% +8.52% [kernel.kallsyms] [k] enabled_wait 9.04% -4.11% -3.79% [kernel.kallsyms] [k] __copy_from_user 2.30% -0.79% -0.71% [kernel.kallsyms] [k] arch_free_page 2.17% -0.65% -0.58% [kernel.kallsyms] [k] arch_alloc_page 0.69% -0.25% -0.24% [kernel.kallsyms] [k] get_page_from_freelist 0.56% +0.08% +0.14% [kernel.kallsyms] [k] virtio_ccw_kvm_notify 0.42% -0.11% -0.09% [kernel.kallsyms] [k] tcp_sendmsg 0.31% -0.15% -0.14% [kernel.kallsyms] [k] tcp_write_xmit uperf slave: 72.44% +8.99% +8.85% [kernel.kallsyms] [k] enabled_wait 8.99% -3.67% -3.51% [kernel.kallsyms] [k] __copy_to_user 2.31% -0.71% -0.67% [kernel.kallsyms] [k] arch_free_page 2.16% -0.67% -0.63% [kernel.kallsyms] [k] arch_alloc_page 0.89% -0.14% -0.11% [kernel.kallsyms] [k] virtio_ccw_kvm_notify 0.71% -0.30% -0.30% [kernel.kallsyms] [k] get_page_from_freelist 0.70% -0.25% -0.29% [kernel.kallsyms] [k] __wake_up_sync_key 0.61% -0.22% -0.22% [kernel.kallsyms] [k] virtqueue_add_inbuf > > May worth to try disable zerocopy or do the test form host to guest > instead of guest to guest to exclude the possible issue of sender. > With zerocopy disabled, still seeing the regression. The provided perf #s have zerocopy enabled. I replaced 1 uperf guest and instead ran that uperf client as a host process, pointing at a guest. All traffic still over the virtual bridge. In this setup, it's still easy to see the regression for the remaining guest1<->guest2 uperf run, but the host<->guest3 run does NOT exhibit a reliable regression pattern. The significant perf diffs from the host uperf process (baseline=4.12, delta=4.13): 59.96% +5.03% [kernel.kallsyms] [k] enabled_wait 6.47% -2.27% [kernel.kallsyms] [k] raw_copy_to_user 5.52% -1.63% [kernel.kallsyms] [k] raw_copy_from_user 0.87% -0.30% [kernel.kallsyms] [k] get_page_from_freelist 0.69% +0.30% [kernel.kallsyms] [k] finish_task_switch 0.66% -0.15% [kernel.kallsyms] [k] swake_up 0.58% -0.00% [vhost] [k] vhost_get_vq_desc ... 0.42% +0.50% [kernel.kallsyms] [k] ckc_irq_pending I also tried flipping the uperf stream around (a guest uperf client is communicating to a slave uperf process on the host) and also cannot see the regression pattern. So it seems to require a guest on both ends of the connection.