> The true value of sorting subtables will only materialize when having
> one sorted list per ingress port. Due to RSS and vhost-user
> multi-queue I am afraid that, when performance really matters, each
> port will be split over more than one PMD and every PMD will serve
> many ports. There is no reason why the assignment of port rx queues
> to PMD threads should in any way correlate to the decomposition of
> the megaflow cache.
> 
> So we would have to add a sorted list of subtables per ingress port
> to the PMD. One way would be to periodically sort this list based on
> subtable hit counters. Another, simpler approach might be to always
> insert the last hit subtable at the front of the list (a most
> recently used list), but that has slightly higher cost per packet.
> 
> /Jan

I have written a prototype patch that introduces 32 subtable vectors
per datapath and hashes the ingress port to select the subtable vector.
The patch also counts matches per 32 slots in each vector (hashing the
subtable pointer to obtain the slot) and sorts the vectors according to
match frequency every second.

The use case I have benchmarked is a cloud L3 pipeline with VXLAN
encapsulation on the physical DPDK port. For details of the OVS
configuration and DP flow entries see below. With pure tenant traffic
the resulting DPIF datapath contains 4 subtables. 

Disabling the EMC on master I have measured a baseline performance
(in+out) of ~1.32 Mpps (64 bytes, 1000 L4 flows). The average number of
subtable lookups per megaflow match is 2.5.

With the patch the average number of subtable lookups per megaflow match
goes down to 1.25 (Apparently there are still two ports of different
nature hashed to the same vector, otherwise it should be exactly one).
Even so the forwarding performance grows by ~30% to 1.72 Mpps. 

As the number of subtables will often be higher in reality, I assume
that this is at the lower end of the speed-up one can expect from such
an optimization. Is there an interest to upstream this to dpif-netdev?

BR, Jan


Details of the measurement setup
-------------------------------------------

# ovs-vsctl show
    Bridge br-int
        Port br-int
            Interface br-int
                type: internal
        Port "vhost811"
            Interface "vhost811"
                type: dpdkvhostuser
        Port "vhost812"
            Interface "vhost812"
                type: dpdkvhostuser
        Port "vhost813"
            Interface "vhost813"
                type: dpdkvhostuser
        Port "vhost814"
            Interface "vhost814"
                type: dpdkvhostuser
        Port "vhost815"
            Interface "vhost815"
                type: dpdkvhostuser
        Port "vxlan0"
            Interface "vxlan0"
                type: vxlan
                options: {key=flow, remote_ip="10.1.2.9"}
    Bridge br-prv
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
        Port br-prv
            Interface br-prv
                type: internal

# ovs-appctl dpif/show
netdev@ovs-netdev: hit:3793015111 missed:94
        br-prv:
                br-prv 65534/1: (tap)
                dpdk0 1/2: (dpdk: configured_rx_queues=1,
configured_tx_queues=49, requested_rx_queues=1, requested_tx_queues=49)
br-int: br-int 65534/12: (tap)
                vhost811 11/13: (dpdkvhostuser: configured_rx_queues=1,
configured_tx_queues=1, requested_rx_queues=1, requested_tx_queues=49)
vhost812 12/7: (dpdkvhostuser: configured_rx_queues=1,
configured_tx_queues=1, requested_rx_queues=1, requested_tx_queues=49)
vhost813 13/8: (dpdkvhostuser: configured_rx_queues=1,
configured_tx_queues=1, requested_rx_queues=1, requested_tx_queues=49)
vhost814 14/6: (dpdkvhostuser: configured_rx_queues=1,
configured_tx_queues=1, requested_rx_queues=1, requested_tx_queues=49)
vhost815 15/10: (dpdkvhostuser: configured_rx_queues=1,
configured_tx_queues=1, requested_rx_queues=1, requested_tx_queues=49)
vxlan0 100/9: (vxlan: key=flow, remote_ip=10.1.2.9)

# ovs-appctl dpif/dump-flows br-prv
recirc_id(0),in_port(1),eth(src=8c:dc:d4:ab:5b:f0,dst=8c:dc:d4:ab:58:48),eth_type(0x0800),ipv4(frag=no),
packets:90804691, bytes:11663496810, used:0.000s, actions:2
recirc_id(0),in_port(2),eth(src=8c:dc:d4:ab:58:48,dst=8c:dc:d4:ab:5b:f0),eth_type(0x0800),ipv4(dst=10.1.2.8,proto=17,frag=no),udp(dst=4789),
packets:90804755, bytes:11663507946, used:0.000s, actions:tnl_pop(9)

# ovs-appctl dpif/dump-flows br-int
recirc_id(0),in_port(7),eth(src=52:54:00:a0:81:03),eth_type(0x0800),ipv4(dst=10.1.91.3,proto=6,tos=0/0x3,frag=no),
packets:18118931, bytes:1419229172, used:0.001s, flags:.,
actions:tnl_push(tnl_port(9),header(size=50,type=4,eth(dst=8c:dc:d4:ab:58:48,src=8c:dc:d4:ab:5b:f0,dl_type=0x0800),ipv4(src=10.1.2.8,dst=10.1.2.9,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x912)),out_port(1))
recirc_id(0),in_port(10),eth(src=52:54:00:a0:81:06),eth_type(0x0800),ipv4(dst=10.1.91.6,proto=6,tos=0/0x3,frag=no),
packets:39240780, bytes:3065208016, used:0.000s, flags:.,
actions:tnl_push(tnl_port(9),header(size=50,type=4,eth(dst=8c:dc:d4:ab:58:48,src=8c:dc:d4:ab:5b:f0,dl_type=0x0800),ipv4(src=10.1.2.8,dst=10.1.2.9,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x915)),out_port(1))
recirc_id(0),in_port(8),eth(src=52:54:00:a0:81:04),eth_type(0x0800),ipv4(dst=10.1.91.4,proto=6,tos=0/0x3,frag=no),
packets:17863230, bytes:1530000904, used:0.000s, flags:.,
actions:tnl_push(tnl_port(9),header(size=50,type=4,eth(dst=8c:dc:d4:ab:58:48,src=8c:dc:d4:ab:5b:f0,dl_type=0x0800),ipv4(src=10.1.2.8,dst=10.1.2.9,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x913)),out_port(1))
recirc_id(0),in_port(6),eth(src=52:54:00:a0:81:05),eth_type(0x0800),ipv4(dst=10.1.91.5,proto=6,tos=0/0x3,frag=no),
packets:18064313, bytes:1416662172, used:0.000s, flags:.,
actions:tnl_push(tnl_port(9),header(size=50,type=4,eth(dst=8c:dc:d4:ab:58:48,src=8c:dc:d4:ab:5b:f0,dl_type=0x0800),ipv4(src=10.1.2.8,dst=10.1.2.9,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=4789,csum=0x0),vxlan(flags=0x8000000,vni=0x914)),out_port(1))
tunnel(tun_id=0x814,src=10.1.2.9,dst=10.1.2.8,flags(-df-csum+key)),skb_mark(0),recirc_id(0),in_port(9),eth(dst=52:54:00:a0:91:05),eth_type(0x0800),ipv4(frag=no),
packets:18064313, bytes:1416662172, used:0.000s, flags:.,
actions:set(eth(dst=52:54:00:a0:81:05)),6
tunnel(tun_id=0x813,src=10.1.2.9,dst=10.1.2.8,flags(-df-csum+key)),skb_mark(0),recirc_id(0),in_port(9),eth(dst=52:54:00:a0:91:02),eth_type(0x0800),ipv4(frag=no),
packets:17863261, bytes:1530004748, used:0.000s, flags:.,
actions:set(eth(dst=52:54:00:a0:81:04)),8
tunnel(tun_id=0x815,src=10.1.2.9,dst=10.1.2.8,flags(-df-csum+key)),skb_mark(0),recirc_id(0),in_port(9),eth(dst=52:54:00:a0:91:04),eth_type(0x0800),ipv4(frag=no),
packets:39240827, bytes:3065213844, used:0.000s, flags:.,
actions:set(eth(dst=52:54:00:a0:81:06)),10
tunnel(tun_id=0x812,src=10.1.2.9,dst=10.1.2.8,flags(-df-csum+key)),skb_mark(0),recirc_id(0),in_port(9),eth(dst=52:54:00:a0:91:03),eth_type(0x0800),ipv4(frag=no),
packets:18118931, bytes:1419229172, used:0.001s, flags:.,
actions:set(eth(dst=52:54:00:a0:81:03)),7

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to