Hello, > Scaling up to more CPUs and TCP-stream, Tariq[1] and I have showed the > Linux kernel network stack scales to 94Gbit/s (linerate minus overhead). > But when the drivers page-recycler fails, we hit bottlenecks in the > page-allocator, that cause negative scaling to around 43Gbit/s. > > [1] http://lkml.kernel.org/r/cef85936-10b2-5d76-9f97-cb03b418f...@mellanox.com > > Linux have for a _long_ time been doing 10Gbit/s TCP-stream easily, on > a SINGLE CPU. This is mostly thanks to TSO/GRO aggregating packets, > but last couple of years the network stack have been optimized (with > UDP workloads), and as a result we can do 10G without TSO/GRO on a > single-CPU. This is "only" 812Kpps with MTU size frames.
Cannot find the reference anymore, but there was once some workshop held by you during some netdev where you were stating that you're practially in rigorous exchange with NIC vendors as to having them tremendously increase the RX/TX rings(queues) numbers. Further, that there are hardly any limits to the number other than FPGA magic/physical HW - up to millions is viable was coined back then. May I ask were this ended up? Wouldn't that be key for massive parallelization either - With having a queue(producer), a CPU (consumer) - vice versa - per flow at the extreme? Did this end up in this SMART-NIC thingummy? The latter is rather trageted at XDP, no? -- Besten Gruß Matthias Tafelmeier
0x8ADF343B.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature