On Fri, Apr 7, 2017 at 8:31 AM, Koen Vandeputte <koen.vandepu...@ncentric.com> wrote: > > > On 2017-04-03 16:03, Tim Harvey wrote: > > > Hi Tim, > > Also apologies for the late reply. > It's also a very busy period here :) > > It also took some time to retest this issue list again using todays trunk > version (4.9.20 kernel) > >> When the IMX6 PCIe host controller uses MSI legacy interrupts stop >> working and thus any card/driver using legacy will not have >> functioning interrupts. I'm not sure what that list of card/drivers is >> that require legacy interrupts but I know ath9k is one of them and >> just verified it doesn't get any interrupts currently on LEDE master >> with 4.9. >> >> >> You can do the following to hack out the requirement of MSI for the >> IMX6 PCIe host controller, then disable CONFIG_PCI_MSI is kernel >> config >> diff --git a/drivers/pci/dwc/Kconfig b/drivers/pci/dwc/Kconfig >> index dfb8a69..31cf8ad 100644 >> --- a/drivers/pci/dwc/Kconfig >> +++ b/drivers/pci/dwc/Kconfig >> @@ -6,7 +6,6 @@ config PCIE_DW >> config PCIE_DW_HOST >> bool >> depends on PCI >> - depends on PCI_MSI_IRQ_DOMAIN >> select PCIE_DW >> >> config PCI_DRA7XX >> @@ -45,7 +44,6 @@ config PCI_IMX6 >> bool "Freescale i.MX6 PCIe controller" >> depends on PCI >> depends on SOC_IMX6Q >> - depends on PCI_MSI_IRQ_DOMAIN >> select PCIEPORTBUS >> select PCIE_DW_HOST >> > Works perfectly, thanks for investigating! > ath9k is up & running again. > > Although the paths are different than the patch above mentions. --> > (a/drivers/pci/host/Kconfig) > > Would you like me to make a patch specific for LEDE which integrates this? > If so, please provide me your S-o-B line. > > > Felix, > > Would you accept this workaround? > Without it, using the current LEDE trunk in the imx6 platform will render > most legacy irq devices useless .. and I think it will take some time for > the gentlemen upstream to properly fix it. >
Koen, Yes, I just realized also the paths were changed recently in mainline. The IMX6 PCIe maintainer is working on a more appropriate patch which will allow MSI to be used in cases where it can. Let's wait a couple of days to see what he comes up with and then I can look at backporting it. If not, I can rework the patch that 'hacks' MSI off. >> >>>> Other issues seen so far compared to kernel 4.4: >>>> - A simple "reboot" doesn't work. UART output shows "Reboot failed" and >>>> the board stalls. Powercycle is needed >> >> This can occur on older revision boards where the PMIC is not reset on >> IMX6 watchdog reset and a watchdog reset (which is what is used on >> soft reboot) occurs when the CPU is above 800Mhz. Can you provide the >> serial number of the board you are seeing this on and verify that if >> you force the cpu to 800mhz (ie userspace cpufreq governor) prior to >> reset the issue does not occur? >> >> The work-around for this is to use the Gateworks System Controller >> watchdog to restart the board which does a full board power cycle, but >> I haven't had time to get that driver mainlined yet (and thus have >> also not submitted it to LEDE/OpenWrt). > > I don't think this one is caused by the explanation above for following > reasons: > - Test conducted on a GW5200 which runs at 800MHz (never above) > - Using identical LEDE builds (1 with 4.4 kernel & 1 with 4.9 kernel), the > reboot 100% works on 4.4 and 100% fails on 4.9 > > I did notice a small delta which aids a bit in this case. > The watchdog eventually reboots the board (both when issues the "reboot" > command, and after using sysupgrade) > It's not elegant at all.. but reduces priority slightly as the board doesn't > get stuck. I just reproduced it on a GW5200 with 800MHz IMX6DL as well so something else is going on here. I'm investigating. <snip> >>>> >>>> General issues in kernels 4.4 & 4.9 >>>> - Even using the latest UBI FS sources + using the Sync option in >>>> bootarg, >>>> files can get corrupted on a power cut. If the corrupted file is a boot >>>> file .. :) >> >> can you point me to documentation on this bootarg, i'm not familiar with >> it? > > Documented here: > http://www.linux-mtd.infradead.org/doc/ubifs.html#L_sync_semantics > > > > Enabled using this: > > --- a/target/linux/imx6/image/bootscript-ventana > +++ b/target/linux/imx6/image/bootscript-ventana > @@ -50,7 +50,7 @@ if itest.s "x${dtype}" == "xnand" ; then > mtdparts del rootfs && mtdparts add nand0 - ubi > echo "mtdparts:${mtdparts}" > setenv fsload ubifsload > - setenv root "ubi0:ubi ubi.mtd=2 rootfstype=squashfs,ubifs" > + setenv root "ubi0:ubi ubi.mtd=2 rootfstype=squashfs,ubifs > rootflags=sync" > else > echo "Booting from block device ${bootdev}..." > setenv fsload "${fs}load ${dtype} ${disk}:1" > > > > You can check if it's running by looking for following line in bootlog: > [ 2.682056] UBIFS: parse sync > > > > testing this on dozen of boards for weeks reveals that it reduces the > empty-file issues with ~70% > It will sacrifice some write speed as it just basically disables write > caching and flushes all changes to NAND asap. > There was a known IMX GPMI NAND issue that could cause corruption which was fixed in Linux 4.7. I haven't tested 4.9 extensively yet for this but I can set up a test. Send me as many details as you can as far as how easy this is to reproduce and what your test setup is like. >>>> >>>> >>>> Other than this it runs pretty stable :) >>>> >>> Tim, >>> >>> I found 1 more issue on 4.4 & 4.9 kernels: >>> >>> https://lists.debian.org/debian-arm/2016/02/msg00000.html >>> >>> I'm also seeing this on 4.4 kernel. >>> It can take up to a few days before it triggers normally, but I have a >>> setup >>> running which reproduces this within a few hours. >>> >>> I've made a patch which increases the timeout in the FEC driver just for >>> testing .. but it still occurs causing the port to be disabled suddenly. >>> >> I've seen reports of this as well but usually it takes days of >> activity if/before it happens. The MDIO timeout in FEC is currently >> 3ms - what did you increase it to and are you certain it makes these >> issues go away? Perhaps we need to start a discussion about this on >> linux-net. I'm not clear if an MDIO read timeout should cause an >> interface to go down (or if some layer should retry). I'm also not >> clear why an MDIO read would not complete in 3ms. > > I did not state that it was solved/better. > > For testing, I increased the timeout to 9ms but the issues still pops up. > > I fully agree with you that: > - Each read should 99.99999999% finish with these 3 ms. > - If for whatever reason it doesn't succeed, the iface should not halt > operating. > > Lets go fishing upstream for this one :) Agreed - already posted a question to linux-netdev this morning. Tim _______________________________________________ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev