On 3/26/19 12:24 AM, Hauke Mehrtens wrote: > Hi Petr > > On 3/14/19 6:46 AM, Petr Cvek wrote: >> Hello again, >> >> I've managed to enhance few drivers for lantiq platform. They are still >> in ugly commented form (ethernet part especially). But I need some hints >> before the final version. The patches are based on a kernel 4.14.99. >> Copy them into target/linux/lantiq/patches-4.14 (cleaned from any of my >> previous patch). > > Thanks for working on this. > >> The eth+irq speedup is up to 360/260 Mbps (the vanilla was 170/80 on my >> setup). The iperf3 benchmark (2 passes for both vanilla and changed >> versions) altogether with script are in the attachment. >> >> 1) IRQ with SMP and balancing support: >> >> 0901-add-icu-smp-support.patch >> 0902-enable-external-irqs-for-second-vpe.patch >> 0903-add-icu1-node-for-smp.patch >> >> As requested I've changed the patch heavily. The original locking from >> k3b source code (probably from UGW) didn't work and in heavy load the >> system could have froze (smp affinity change during irq handling). This >> version has this fixed by using generic raw spinlocks with irq. >> >> The SMP IRQ now works in a way that before every irq_enable (serves as >> unmask too) the VPE will be switched. This can be limited by writing >> into /proc/irq/X/smp_affinity (it can be possibly balanced from >> userspace too). >> >> I've rewritten the device tree reg fields so there are only 2 arrays >> now. One per an icu controller. The original one per module was >> redundant as the ranges were continuous. The modules of a single ICU are >> now explicitly computed in a macro: >> >> ltq_w32((x), ltq_icu_membase[vpe] + m*0x28 + (y)) >> ltq_r32(ltq_icu_membase[vpe] + m*0x28 + (x)) >> >> before there was a pointer for every 0x28 block (there shouldn't be >> speed downgrade, only a multiplication and an addition for every >> register access). >> >> Also I've simplified register names from LTQ_ICU_IM0_ISR to LTQ_ICU_ISR >> as "IM0" (module) was confusing (the real module number 0-4 was a part >> of the macro). >> >> The code is written in a way it should work fine on a uniprocessor >> configuration (as the for_each_present_cpu etc macros will cycle on a >> single VPE on uniprocessor). I didn't test the no CONFIG_SMP yet, but I >> did check it with "nosmp" kernel parameter. It works. >> >> Anyway please test if you have the board where the second VPE is used >> for FXS. >> >> The new device tree structure is now incompatible with an old version of >> the driver (and old device tree with the new driver too). It seems icu >> driver is used in Danube, AR9, AmazonSE and Falcon chipset too. I don't >> know the hardware for these boards so before a final patch I would like >> to know if they have a second ICU too (at 0x80300 offset). > > Normally the device tree should stay stable, but I already though about > the same change and I am not aware that any device ships a U-Boot with > an embedded device tree, so this should be fine. > > The Amazon and Amazon SE only have one ICU block because they only have > one CPU with one VPE. > The Danube SoC has two ICU blocks one for each CPU, each CPU only has > one VPE. The CPUs are not cache coherent, SMP is not possible. > > Falcon, AR9, VR9, AR10, ARX300, GRX300, GRX330 have two ICU blocks one > for each VPE of the single CPU. > GRX350 uses a MIPS InterAptiv CPU with a MIPS GIC. > >> More development could be done with locking probably. As only the >> accesses in a single module (= 1 set of registers) would cause a race >> condition. But as the most contented interrupts are in the same module >> there won't be much speed increase IMO. I can add it if requested (just >> spinlock array and some lookup code). > > I do not think that this improves the performance significantly, I > assume that the CPUs only have to wait there in rare conditions anyway. > >> 2) Reworked lantiq xrx200 ethernet driver: >> >> 0904-backport-vanilla-eth-driver.patch >> 0905-increase-dma-descriptors.patch >> 0906-increase-dma-burst-size.patch >> >> The code is still ugly, but stable now. There is a fragmented skb >> support and napi polling. DMA ring buffer was increased so it handle >> faster speeds and I've fixed some code weirdness. A can split the >> changes in the future into separate patches. > > It would be nice if you could also do the same changes to the upstream > driver in mainline Linux kernel and send this for inclusion to mainline > Linux. > >> I didn't test the ICU and eth patches separate, but I've tested the >> ethernet driver on a single VPE only (by setting smp affinity and >> nosmp). This version of the ethernet driver was used for root over NFS >> on the debug setup for like two weeks (without problems). >> >> Tell me if we should pursue the way for the second DMA channel to PPE so >> both VPEs can send frames at the same time. > > I think it should be ok to use both DMA channels for the CPU traffic. > >> 3) WAVE300 >> >> In the two past weeks I've tried to fix a mash together various versions >> of wave300 wifi driver (there are partial version in GPL sources from >> router vendors). And I've managed to put the driver into "not >> immediately crashing" mode. If you are interested in the development, >> there is a thread in openwrt forum. The source repo here: >> >> https://repo.or.cz/wave300.git >> https://repo.or.cz/wave300_rflib.git >> >> (the second one must be copied into the first one) >> >> The driver will often crash when meeting an unknown packet, request for >> encryption (no encryption support), unusual combination of configuration >> or just by module unloading. The code is _really_ ugly and it will >> server only as hardware specification for better GPL driver development. >> If you want to help or you have some tips you can join the forum (there >> are links for firmwares and intensive research of available source codes >> from vendors). >> >> Links: >> https://forum.openwrt.org/t/support-for-wave-300-wi-fi-chip/24690/129 >> https://forum.openwrt.org/t/how-can-we-make-the-lantiq-xrx200-devices-faster/9724/70 >> https://forum.openwrt.org/t/xrx200-irq-balancing-between-vpes/29732/25 >> >> Petr > Hauke
It would be nice if you could send your patches as single mails and inline so I can easily comment on them. The DMA handling in the OpenWrt Ethernet driver is only more flexible to handle arbitrary number of DMA channels, but I think this is not needed. The DMA memory is already 16 byte aligned, see the byte_offset variable in xmit, so it should not be a problem to use the 4W DMA mode, I assume that the hardware also takes care of this. Why are the changes in arch/mips/kernel/smp-mt.c needed? this looks strange to me. Changing LTQ_DMA_CPOLL could affect the latency of the system, but I think your increase should not harm significantly. Hauke _______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel