On 03/03/14 13:27, Stefan Hajnoczi wrote: > On Fri, Feb 28, 2014 at 08:28:11AM +0000, Anton Ivanov (antivano) wrote: >> 3. Qemu to communicate with the local host, remote vms, network devices, >> etc at speeds which for a number of use cases exceed the speed of the >> legacy tap driver. > This surprises me. It's odd that tap performs significantly worse.
Multipacket RX can go a very long way and it does not work on tap's emulation of a raw socket. At least in 3.2 :) I could have put multipacket-TX in too, but that means breaking QoS for my use cases as well as using 2+ thread IO, improving current qemu timer API, etc. I have looked into it - it is doable. I would not try that in first instance at this point. Tap at present can beat l2tpv3 on use cases where offloads have a significant contribution. Tap is slower on anything per-packet and it is slower end-to-end when combined with a bridge and/or OVS. > > I guess you have two options for using kernel L2TPv3 support: > > 1. Use a bridge and tap to forward between the L2TPv3 interface and the > QEMU. Correct. It ends up being slower on per-packet use cases. Also, in a system it introduces one more touch point - the bridge. It needs to be configured and kept up to date. For our key use case (one which we will ship as a product) we have the following topology: [Customer LAN] <-> [Physical CPE] <->... network... <-> [VM running a service] An example of this would be putting a media server or a NAS on a VM and joining it to a customer network. We can connect the VM via a switch. Nothing wrong with that and it may have the same performance end-to-end. However, it will introduce an extra touch point to deal with in terms control plane, orchestration and provisioning. > > 2. Use macvtap on top of the L2TPv3 interface (if possible). Did not try that so cannot really say. > > Option #2 should be faster. > > Now about the tap userspace ABI, is the performance bottleneck that the > read(2) system call only receives one packet at a time? The tap file > descriptor is not a socket so recvmmsg(2) cannot be used on it directly. If I read the kernel source correctly the tap fd can emulate a socket for some calls. However, when I try recvmmsg I get an ENOTSOCKET. > > I have wondered in the past whether something like packet mmap would be > possible on a tap device. I have done it on raw. I have it approved for submission, mmap works fine on RX (once again). Packet mmap does not work on TX - you end up having to filter your own frames leading to an overall drop in efficiency. So TX still has to be a write to socket. We will be contributing that one shortly after I clean it up to the required coding standards (in fact parts of the source sneaked into the original diff file by mistake). Theoretically, it has very little advantages compared to recvmmsg as there is a copy involved in both cases. I am happy to rewrite that for recvmmsg instead of packet mmap so we can reuse the vector IO code across both drivers. > At that point userspace just waits for > notifications and the actual packets don't require any syscalls. Indeed. That driver will be contributed shortly. We have done that one too :) A. > >> Our suggestion would be that this over time obsoletes the UDP variety of >> the "socket" driver. > Yes, thank you! L2TPv3 looks like a good replacement for the "socket" > driver. > > Will review and respond to your patch in detail. > > Stefan