I think the would be nice if we add a command-line option (or maybe even an option that can be changed dinamically) for the net backend that enables/disables the use of the CORK flag, so that the device emulation can use it anyways, but backend does really use it (and so there are real effects) only if the user specifies it.
In this way we leave standard QEMU setups unchanged and we enable special applications (e.g. high packet rate middleboxes) to use the CORK flag to get higher packet rate, possibly sacrifying a bit of latency. Cheers, Vincenzo 2014-03-05 9:59 GMT+01:00 Stefan Hajnoczi <stefa...@gmail.com>: > On Tue, Mar 04, 2014 at 09:47:09AM +0000, Anton Ivanov (antivano) wrote: > > On 04/03/14 09:36, Stefan Hajnoczi wrote: > > > On Mon, Mar 03, 2014 at 02:01:00PM +0000, Anton Ivanov (antivano) > wrote: > > >> On 03/03/14 13:27, Stefan Hajnoczi wrote: > > >>> On Fri, Feb 28, 2014 at 08:28:11AM +0000, Anton Ivanov (antivano) > wrote: > > >>>> 3. Qemu to communicate with the local host, remote vms, network > devices, > > >>>> etc at speeds which for a number of use cases exceed the speed of > the > > >>>> legacy tap driver. > > >>> This surprises me. It's odd that tap performs significantly worse. > > >> > > >> Multipacket RX can go a very long way and it does not work on tap's > > >> emulation of a raw socket. At least in 3.2 :) > > > Luigi and Vincenzo had ideas on making QEMU's net layer support > > > multipacket tx using something like TCP_CORK. This would map to > > > sendmmsg(2). > > > > > > Basically the net client gets multiple .receive() calls but is told to > > > hold off on submitting the packets. Then, when it finally gets > > > uncorked, it can sendmmsg(2). The only issue is we need to hold on to > > > the tx buffers longer than normal. > > > > Cool, I will be happy to give a hand with that. > > > > My main problem so far trying to implement it has been the timers - the > > qemu internal timer API has no relative timers, only absolute. So you > > end up with a very high cost of setting and checking a delayed xmit > timer. > > I'm thinking about something simpler that doesn't use a timer: > > Rely on the guest to submit a batch of packets for tx. When processing > the descriptor ring in the device emulation code (virtio-net, etc), use > the CORK flag on all packets except the final one. > > This essentially hands the contents of the tx ring to the netdev (tap, > L2TPv3, etc) and then lets them submit the entire batch using > sendmmsg(2). > > When we discussed this previously there was concern about the latency > added by CORK. > -- Vincenzo Maffione