> The true value of sorting subtables will only materialize when having > one sorted list per ingress port. Due to RSS and vhost-user > multi-queue I am afraid that, when performance really matters, each > port will be split over more than one PMD and every PMD will serve > many ports. There is no reason why the assignment of port rx queues > to PMD threads should in any way correlate to the decomposition of > the megaflow cache. > > So we would have to add a sorted list of subtables per ingress port > to the PMD. One way would be to periodically sort this list based on > subtable hit counters. Another, simpler approach might be to always > insert the last hit subtable at the front of the list (a most > recently used list), but that has slightly higher cost per packet. > > /Jan
I have written a prototype patch that introduces 32 subtable vectors per datapath and hashes the ingress port to select the subtable vector. The patch also counts matches per 32 slots in each vector (hashing the subtable pointer to obtain the slot) and sorts the vectors according to match frequency every second. The use case I have benchmarked is a cloud L3 pipeline with VXLAN encapsulation on the physical DPDK port. For details of the OVS configuration and DP flow entries see below. With pure tenant traffic the resulting DPIF datapath contains 4 subtables. Disabling the EMC on master I have measured a baseline performance (in+out) of ~1.32 Mpps (64 bytes, 1000 L4 flows). The average number of subtable lookups per megaflow match is 2.5. With the patch the average number of subtable lookups per megaflow match goes down to 1.25 (Apparently there are still two ports of different nature hashed to the same vector, otherwise it should be exactly one). Even so the forwarding performance grows by ~30% to 1.72 Mpps. As the number of subtables will often be higher in reality, I assume that this is at the lower end of the speed-up one can expect from such an optimization. Is there an interest to upstream this to dpif-netdev? BR, Jan Details of the measurement setup ------------------------------------------- # ovs-vsctl show Bridge br-int Port br-int Interface br-int type: internal Port "vhost811" Interface "vhost811" type: dpdkvhostuser Port "vhost812" Interface "vhost812" type: dpdkvhostuser Port "vhost813" Interface "vhost813" type: dpdkvhostuser Port "vhost814" Interface "vhost814" type: dpdkvhostuser Port "vhost815" Interface "vhost815" type: dpdkvhostuser Port "vxlan0" Interface "vxlan0" type: vxlan options: {key=flow, remote_ip="10.1.2.9"} Bridge br-prv Port "dpdk0" Interface "dpdk0" type: dpdk Port br-prv Interface br-prv type: internal # ovs-appctl dpif/show netdev@ovs-netdev: hit:3793015111 missed:94 br-prv: br-prv 65534/1: (tap) dpdk0 1/2: (dpdk: configured_rx_queues=1, configured_tx_queues=49, requested_rx_queues=1, requested_tx_queues=49) br-int: br-int 65534/12: (tap) vhost811 11/13: (dpdkvhostuser: configured_rx_queues=1, configured_tx_queues=1, requested_rx_queues=1, requested_tx_queues=49) vhost812 12/7: (dpdkvhostuser: configured_rx_queues=1, configured_tx_queues=1, requested_rx_queues=1, requested_tx_queues=49) vhost813 13/8: (dpdkvhostuser: configured_rx_queues=1, configured_tx_queues=1, requested_rx_queues=1, requested_tx_queues=49) vhost814 14/6: (dpdkvhostuser: configured_rx_queues=1, configured_tx_queues=1, requested_rx_queues=1, requested_tx_queues=49) vhost815 15/10: (dpdkvhostuser: configured_rx_queues=1, configured_tx_queues=1, requested_rx_queues=1, requested_tx_queues=49) vxlan0 100/9: (vxlan: key=flow, remote_ip=10.1.2.9) # ovs-appctl dpif/dump-flows br-prv recirc_id(0),in_port(1),eth(src=8c:dc:d4:ab:5b:f0,dst=8c:dc:d4:ab:58:48),eth_type(0x0800),ipv4(frag=no), packets:90804691, bytes:11663496810, used:0.000s, actions:2 recirc_id(0),in_port(2),eth(src=8c:dc:d4:ab:58:48,dst=8c:dc:d4:ab:5b:f0),eth_type(0x0800),ipv4(dst=10.1.2.8,proto=17,frag=no),udp(dst=4789), packets:90804755, bytes:11663507946, used:0.000s, actions:tnl_pop(9) # ovs-appctl dpif/dump-flows br-int recirc_id(0),in_port(7),eth(src=52:54:00:a0:81:03),eth_type(0x0800),ipv4(dst=10.1.91.3,proto=6,tos=0/0x3,frag=no), packets:18118931, bytes:1419229172, used:0.001s, flags:., actions:tnl_push(tnl_port(9),header(size=50,type=4,eth(dst=8c:dc:d4:ab:58:48,src=8c:dc:d4:ab:5b:f0,dl_type=0x0800),ipv4(src=10.1.2.8,dst=10.1.2.9,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x912)),out_port(1)) recirc_id(0),in_port(10),eth(src=52:54:00:a0:81:06),eth_type(0x0800),ipv4(dst=10.1.91.6,proto=6,tos=0/0x3,frag=no), packets:39240780, bytes:3065208016, used:0.000s, flags:., actions:tnl_push(tnl_port(9),header(size=50,type=4,eth(dst=8c:dc:d4:ab:58:48,src=8c:dc:d4:ab:5b:f0,dl_type=0x0800),ipv4(src=10.1.2.8,dst=10.1.2.9,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x915)),out_port(1)) recirc_id(0),in_port(8),eth(src=52:54:00:a0:81:04),eth_type(0x0800),ipv4(dst=10.1.91.4,proto=6,tos=0/0x3,frag=no), packets:17863230, bytes:1530000904, used:0.000s, flags:., actions:tnl_push(tnl_port(9),header(size=50,type=4,eth(dst=8c:dc:d4:ab:58:48,src=8c:dc:d4:ab:5b:f0,dl_type=0x0800),ipv4(src=10.1.2.8,dst=10.1.2.9,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x913)),out_port(1)) recirc_id(0),in_port(6),eth(src=52:54:00:a0:81:05),eth_type(0x0800),ipv4(dst=10.1.91.5,proto=6,tos=0/0x3,frag=no), packets:18064313, bytes:1416662172, used:0.000s, flags:., actions:tnl_push(tnl_port(9),header(size=50,type=4,eth(dst=8c:dc:d4:ab:58:48,src=8c:dc:d4:ab:5b:f0,dl_type=0x0800),ipv4(src=10.1.2.8,dst=10.1.2.9,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x914)),out_port(1)) tunnel(tun_id=0x814,src=10.1.2.9,dst=10.1.2.8,flags(-df-csum+key)),skb_mark(0),recirc_id(0),in_port(9),eth(dst=52:54:00:a0:91:05),eth_type(0x0800),ipv4(frag=no), packets:18064313, bytes:1416662172, used:0.000s, flags:., actions:set(eth(dst=52:54:00:a0:81:05)),6 tunnel(tun_id=0x813,src=10.1.2.9,dst=10.1.2.8,flags(-df-csum+key)),skb_mark(0),recirc_id(0),in_port(9),eth(dst=52:54:00:a0:91:02),eth_type(0x0800),ipv4(frag=no), packets:17863261, bytes:1530004748, used:0.000s, flags:., actions:set(eth(dst=52:54:00:a0:81:04)),8 tunnel(tun_id=0x815,src=10.1.2.9,dst=10.1.2.8,flags(-df-csum+key)),skb_mark(0),recirc_id(0),in_port(9),eth(dst=52:54:00:a0:91:04),eth_type(0x0800),ipv4(frag=no), packets:39240827, bytes:3065213844, used:0.000s, flags:., actions:set(eth(dst=52:54:00:a0:81:06)),10 tunnel(tun_id=0x812,src=10.1.2.9,dst=10.1.2.8,flags(-df-csum+key)),skb_mark(0),recirc_id(0),in_port(9),eth(dst=52:54:00:a0:91:03),eth_type(0x0800),ipv4(frag=no), packets:18118931, bytes:1419229172, used:0.001s, flags:., actions:set(eth(dst=52:54:00:a0:81:03)),7 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev