I've been working on OpenConnect VPN performance. After fixing some local stupidities I am basically crypto-bound as I suck packets out of the tun device and feed them out over the public network as fast as the crypto library can encrypt them.
However, the tun device is dropping packets. I'm testing with an ESP setup that the kernel happens to support. If I do netperf UDP_STREAM testing with the kernel doing ESP, I get this: Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 1400 10.00 1198093 0 1341.86 212992 10.00 1198044 1341.80 Change to doing it in userspace through the tun device, though, and it looks more like this: Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 1400 10.00 8194693 0 9178.04 212992 10.00 1536155 1720.49 The discrepancy between sent and received packets is all seen as packet loss on the tun0 interface, where userspace is not reading the packets out fast enough: $ netstat -i Kernel Interface table Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 9001 56790193 0 1718127 0 129849068 0 0 0 BMRU lo 65536 42 0 0 0 42 0 0 0 LRU tun0 1500 9 0 0 0 1546968 0 6647739 0 MOPRU So... I threw together something to stop the queue when the tx_ring was full (which I know is incomplete but was enough for my test) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index e9ca1c088d0b..a15fca23ef45 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1125,7 +1128,9 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) if (tfile->flags & TUN_FASYNC) kill_fasync(&tfile->fasync, SIGIO, POLL_IN); tfile->socket.sk->sk_data_ready(tfile->socket.sk); + if (!ptr_ring_empty(&tfile->tx_ring)) + netif_stop_queue(tun->dev); rcu_read_unlock(); return NETDEV_TX_OK; @@ -2237,7 +2239,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile, else consume_skb(skb); } - + netif_wake_queue(tun->dev); return ret; } So now netperf doesn't send lots of packets that get dropped by the tun device. But it doesn't send anywhere near as many packets successfully, either... Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 1400 10.00 1250223 0 1400.25 212992 10.00 1245458 1394.91 That's actually dropped me back to the performance I was getting with the kernel's ESP implementation. Is that something we should expect? I don't think it's purely the overhead I've added in the driver. If I leave the netif_wake_queue() and add '&& 0' to the condition for the netif_stop_queue(), which should still leave the locking in the ptr_ring_empty() to happen, the performance goes back up. What's going on? Am I actually better off letting it drop packets silently?
smime.p7s
Description: S/MIME cryptographic signature