On Wed, Oct 10, 2012 at 09:20:19AM -0700, Ben Pfaff wrote: > More than anything I'd like to see an analysis, based on at least > simple measurements, probably including some conjecture, of the reason > or reasons that the threaded datapath yields performance that is so > much better. If it comes down to something not fundamental to having > threads, then we can talk about the trade-offs between working on > those issues directly versus introducing threading. > > One issue that's been brought up is the fact that Open vSwitch throws > away the information provided by "poll" regarding the file descriptors > that are ready for read or write. If that's a significant cost that > contributes to slowness (I'm not convinced yet, but I could be > convinced by measurements), then that's something that we can work on;
For measurements see our infocom paper at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6195638 (a preliminary version and other data are available on the netmap page). We saw 50Kpps tops with the single event loop, and about 300Kpps with split threads, in both cases processing one packet per iteration, and using bpf (FreeBSD) or PF_PACKET (?) on linux. Increasing the number of packets per iteration approx. doubles the throughput for split threads (the bottleneck here is the system call to do the packet I/O), whereas gives a negligible improvement in the single event loop case (suggesting that there is a ton of work done at each iteration). > I'm happy to do that. I think that we can work on that without > completely changing how the poll loop works. I am sure you can get some speedup with some relatively simple optimizations, I am just unclear by how much (increasing the batch size, which was another problem, did not help in the single-thread version). Another problem, which i suspect is harder to fix, is that the code before select()/poll() to determine which descriptors we should look at has a lot more work to do than simply marking the descriptors associated to open interfaces/tunnels. Not to mention that if the single thread is busy communicating with the controller, logging, resolving names or whatnot, latency and throughput become extremely hard to control. I do buy the concerns in using a second thread as opposed to a process. However, using a thread at least avoids having to pass around information (updates to the forwarding table, file descriptors for interfaces, etc.) which would require additional complications. True, this comes at the price of adding some locking around data structures, but updates to these data structures should be relatively infrequent and so even a crude locking scheme should work well and be reasonably easy to audit. And, to be perfectly frank, we do not have neither time nor motivation to try and accelerate a single threaded version, which we believe is inherently limited for the reasons mentioned above. We would rather put some work on integrate the openflow forwarding code within our VALE switch http://info.iet.unipi.it/~luigi/vale/ which is based on netmap and works on both FreeBSD and Linux, and could be used to speedup the in-kernel forwarding. Anyways, let's see if we can find some agreement on how to proceed with a face-to-face meeting cheers luigi ~ ~ ~ ~ ~ ~ ~ _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev