On Sat, Jan 23, 2016 at 10:43 AM, Marcus Cenzatti <cenza...@hush.com> wrote: > > > On 1/23/2016 at 1:31 PM, "Adrian Chadd" <adrian.ch...@gmail.com> wrote: >> >>For random src/dst ports and IPs and on the chelsio t5 40gig >>hardware, >>I was getting what, uhm, 40mil tx pps and around 25ish mil rx pps? >> >>The chelsio rx path really wants to be coalescing rx buffers, which >>the netmap API currently doesn't support. I've no idea if luigi has >>plans to add that. So, it has the hilarious side effect of "adding >>more RX queues" translates to "drops in RX performance." :( >> >>Thanks, > > hello, > > I am sorry, are you saying intel and chelsio distribute RX packet load > differently? If I am not mistaken intel will distributed traffic among queues > based on ip addresses flow/hashes/whatever, does chelsio make it per packet > or somethig other? >
I think there are several orthogonal issues here: - traffic distribution has been discussed by Adrian so please look at the email he just sent; - when you use netmap on a single queue ie netmap:ix0-X the software side is as efficient as it can, as it needs to check the status of a single queue on poll() or ioctl(..RXSYNC..). On the contrary, when you access netmap:if0 (i.e. all queues on a single file descriptor) every system call has to check all the queues so you are better off with a smaller number of queues. - on the hardware side, distributing traffic to multiple RX queues has also a cost that increases with the number of queues, as the NIC needs to update the ring pointers and fetch buffers for multiple queues, and you can easily run out of PCIe bandwidth for these transactions. This affects all NICs. Some (ix ?) have parameters to configure how often to update the rings and fetch descriptors, mitigating the problem. Some (ixl) don't. My opinion is that you should use multiple queues only if you want to rely on hw-based traffic steering, and/or your workload is bottlenecked by the CPU rather than bus I/O bandwidth. Even so, use as few queues as possible. Sometimes people use multiple queues to increase the number of receive buffers and tolerate more latency in the software side, but this really depends on the traffic distribution, so in the worst case you are still dealing with a single ring. Often you are better off using a single hw queue and have a process read from it using netmap and demultiplex to different netmap pipes (zero copy). That reduces bus transactions. Another option which I am experimenting these days is forget about individual packets once you are off the wire, and connect the various processes in your pipeline with a stream (TCP or similar) where packets and descriptors are back to back. CPUs and OSes are very efficient in dealing with streams of data. cheers luigi Another motivation would be to have more Often you are better off d CPU limited. on multiqueue is that you should use it only if your workload > how does this behavior you noticed could affect single queue (or applications > on netmap:if0 and not netmap:if0-n) on chelsio? > > thanks > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" -- -----------------------------------------+------------------------------- Prof. Luigi RIZZO, ri...@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/ . Universita` di Pisa TEL +39-050-2217533 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -----------------------------------------+------------------------------- _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"