FYI - the Openfast path patches are applied to several trees. I am running them on a c7 v2 right now and am able to hit close to stock numbers.
The NAT acceleration stuff isn't needed to with open-fastpath patches at all. relevant thread: https://forum.lede-project.org/t/qualcomm-fast-path-for-lede/4582 -Joel On 29 January 2018 at 12:43, Florian Fainelli <f.faine...@gmail.com> wrote: > (please don't top post). > > On 01/28/2018 02:00 PM, Rosen Penev wrote: >> Compared to the Archer C7v2, the v4 has a single ethernet interface >> switched between all 5 ports. The v2 has two ethernet interfaces with >> 4 ports being switched. >> >> Now the disappointing performance has several reasons to it. The main >> one being that the ag71xx driver in OpenWrt is not very optimized for >> the hardware. > > The driver certainly contributes to that, but I don't think it is the > main reason behind it. Each time you send or receive a packet, you need > to invalidate your data cache for at least 1500 bytes, or whatever the > nomimal packet/buffer size has been allocated (e.g: 2KB), with very > small I and D caches (typically 64KB) and no L2 cache, you do this > trashing very frequently and you keep hitting the DRAM as well, this > hurts performance a lot. This is just something the networking stack > does, and it is really to diverge from this because that is inherently > how it is designed, and how drivers are designed as well. This is why > software bypass in hardware are so effective for low power CPUs. > > I would be curious to see the use of XDP redirect and implementing a > software NAT fast path, that is, for the most basic NATP translation, do > this in XDP as early as possible in the driver receive/transmit part and > send directly to the outgoing interface, this should lower the pressure > on the I and D caches by invalidating not the full packet length, but > just the header portion. For more complex protocols, we would keep using > the conntrack helpers to do the necessary operation (FTP, TFTP, SIP, > etc..) on the packet. This might avoid doing a sk_buff allocation for > each packet making it through, which is expensive. > >> >> Qualcomm forked the driver (in 2013 i think) and added some really >> nice features. Some of these need to be backported for ag71xx in >> OpenWrt to be competitive. > > Is it possible to just drop their driver in OpenWrt and get a feeling of > the performance gap? > >> >> It's going to take quite a bit of work to get the driver up to par. >> Biggest performance boost I imagine would be to add GRO support. It >> turns out for good routing performance, GRO requires hardware >> checksumming, which is not supported by ag71xx in OpenWrt at the >> moment. > > Does the hardware actually support checksum offloads? > >> >> On Sun, Jan 28, 2018 at 1:14 PM, Joel Wirāmu Pauling <j...@aenertia.net> >> wrote: >>> Hi as I also am using the archer c7's as my build targets (and c2600's) I >>> am watching this keenly; is anyone else running openvswtich on these with >>> the XDP patches? >>> >>> The c2600 which is arm a15 - currently really could do with optimization >>> and probably is a much better choice for CPE. I would not be caught dead >>> with the c7 as a 10Gbit CPE myself >>> the SoC even with the Openfast path patches just can't handle complex QoS >>> scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit. >>> >>> >>> >>> -Joel >>> --- >>> https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1 >>> >>> On 29 January 2018 at 09:43, Laurent GUERBY <laur...@guerby.net> wrote: >>>> >>>> On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote: >>>>> Hi Rafal, >>>>> >>>>> On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote: >>>>>> Getting better network performance (mostly for NAT) using some kind >>>>>> of >>>>>> acceleration was always a hot topic and people are still >>>>>> looking/asking for it. I'd like to write a short summary and share >>>>>> my >>>>>> understanding of current state so that: >>>>>> 1) People can undesrtand it better >>>>>> 2) We can have some rough plan >>>>>> >>>>>> First of all there are two possible ways of accelerating network >>>>>> traffic: in software and in hardware. Software solution is >>>>>> independent >>>>>> of architecture/device and is mostly just bypassing in-kernel >>>>>> packets >>>>>> flow. It still uses device's CPU which can be a bottleneck. Various >>>>>> software implementations are reported to be faster from 2x to 5x. >>>>> >>>>> This is what I've been observing for the software acceleration here, >>>>> see slide 19 at: >>>>> >>>>> https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd >>>>> ates-canada-2017.pdf >>>>> >>>>> The flowtable representation, in software, is providing a faster >>>>> forwarding path between two nics. So it's basically an alternative to >>>>> the classic forwarding path, that is faster. Packets kick in at the >>>>> Netfilter ingress hook (right at the same location as 'tc' ingress), >>>>> if there is a hit in the software flowtable, ttl gets decremented, >>>>> NATs are done and the packet is placed in the destination NIC via >>>>> neigh_xmit() - through the neighbour layer. >>>> >>>> Hi Pablo, >>>> >>>> I tested today a few things on a brand new TP-Link Archer C7 v4.0, >>>> LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275) >>>> WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency >>>> (everything on the same table), IPv4 unless specified, >>>> using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP). >>>> >>>> With the TP-Link firmware: >>>> - wired 930+ Mbit/s both ways >>>> - wireless 5G 560+ Mbit/s down 440+ Mbit/s up >>>> - wireless 2.4G 100+ Mbit/s both ways >>>> >>>> With OpenWRT/LEDE trunk 20180128 4.4 kernel: >>>> - wired 350-400 Mbit/s both ways >>>> - wired with firewall deactivated 550 Mbit/s >>>> (just "iptables -t nat -A POSTROUTING -j MASQUERADE") >>>> - wired IPv6 routing, no NAT, no firewall 250 Mbit/s >>>> - wireless 5G 150-200 Mbit/s >>>> - wireless 2.4G forgot to test >>>> >>>> top on the router shows sirq at 90%+ during network load, other load >>>> indicators are under 5%. >>>> >>>> IPv6 performance without NAT being below IPv4 with NAT seems >>>> to indicate there are potential gains in software :). >>>> >>>> I didn't test OpenWRT in bridge mode but I got with LEDE 17.01 >>>> on an Archer C7 v2 about 550-600 Mbit/s iperf3 so I think >>>> radio is good on these ath10k routers. >>>> >>>> So if OpenWRT can do about x2 in software routing performance we're >>>> good against our TP-Link firmware friends :). >>>> >>>> tetaneutral.net (not-for-profit ISP, hosting OpenWRT and LEDE mirror in >>>> FR) is going to install 40+ Archer C7 v4 running OpenWRT as CPE, each >>>> with individual gigabit fiber uplink (TP-Link MC220L fiber converter), >>>> and total 10G uplink (Dell/Force10 S4810 48x10G, yes some of our >>>> members will get 10G on their PC at home :). >>>> >>>> We build our images from git source, generating imagebuilder and then a >>>> custom python script. We have 5+ spare C7, fast build (20mn from >>>> scratch) and testing environment, and of course we're interested in >>>> suggestions on what to do. >>>> >>>> Thanks in advance for your help, >>>> >>>> Sincerely, >>>> >>>> Laurent >>>> http://tetaneutral.net >>>> >>>> >>>> _______________________________________________ >>>> Lede-dev mailing list >>>> Lede-dev@lists.infradead.org >>>> http://lists.infradead.org/mailman/listinfo/lede-dev >>> >>> _______________________________________________ >>> Lede-dev mailing list >>> Lede-dev@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/lede-dev >> >> _______________________________________________ >> Lede-dev mailing list >> Lede-dev@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/lede-dev >> > > -- > Florian _______________________________________________ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev