On 2017-04-03 16:03, Tim Harvey wrote:


Hi Tim,

Also apologies for the late reply.
It's also a very busy period here :)

It also took some time to retest this issue list again using todays trunk version (4.9.20 kernel)

When the IMX6 PCIe host controller uses MSI legacy interrupts stop
working and thus any card/driver using legacy will not have
functioning interrupts. I'm not sure what that list of card/drivers is
that require legacy interrupts but I know ath9k is one of them and
just verified it doesn't get any interrupts currently on LEDE master
with 4.9.


You can do the following to hack out the requirement of MSI for the
IMX6 PCIe host controller, then disable CONFIG_PCI_MSI is kernel
config
diff --git a/drivers/pci/dwc/Kconfig b/drivers/pci/dwc/Kconfig
index dfb8a69..31cf8ad 100644
--- a/drivers/pci/dwc/Kconfig
+++ b/drivers/pci/dwc/Kconfig
@@ -6,7 +6,6 @@ config PCIE_DW
  config PCIE_DW_HOST
          bool
         depends on PCI
-       depends on PCI_MSI_IRQ_DOMAIN
          select PCIE_DW

  config PCI_DRA7XX
@@ -45,7 +44,6 @@ config PCI_IMX6
         bool "Freescale i.MX6 PCIe controller"
         depends on PCI
         depends on SOC_IMX6Q
-       depends on PCI_MSI_IRQ_DOMAIN
         select PCIEPORTBUS
         select PCIE_DW_HOST

Works perfectly, thanks for investigating!
ath9k is up & running again.

Although the paths are different than the patch above mentions. --> (a/drivers/pci/host/Kconfig)

Would you like me to make a patch specific for LEDE which integrates this?
If so, please provide me your S-o-B line.


Felix,

Would you accept this workaround?
Without it, using the current LEDE trunk in the imx6 platform will render most legacy irq devices useless .. and I think it will take some time for the gentlemen upstream to properly fix it.


Other issues seen so far compared to kernel 4.4:
- A simple "reboot" doesn't work.  UART output shows "Reboot failed" and
the board stalls. Powercycle is needed
This can occur on older revision boards where the PMIC is not reset on
IMX6 watchdog reset and a watchdog reset (which is what is used on
soft reboot) occurs when the CPU is above 800Mhz. Can you provide the
serial number of the board you are seeing this on and verify that if
you force the cpu to 800mhz (ie userspace cpufreq governor) prior to
reset the issue does not occur?

The work-around for this is to use the Gateworks System Controller
watchdog to restart the board which does a full board power cycle, but
I haven't had time to get that driver mainlined yet (and thus have
also not submitted it to LEDE/OpenWrt).
I don't think this one is caused by the explanation above for following reasons:
- Test conducted on a GW5200 which runs at 800MHz (never above)
- Using identical LEDE builds (1 with 4.4 kernel & 1 with 4.9 kernel), the reboot 100% works on 4.4 and 100% fails on 4.9

I did notice a small delta which aids a bit in this case.
The watchdog eventually reboots the board (both when issues the "reboot" command, and after using sysupgrade) It's not elegant at all.. but reduces priority slightly as the board doesn't get stuck.
- UART DMA disabled is required to avoid some boot errors (I've made a
custom backport from your upstream patch fixing this, but not submitted here
yet)
which boot error specifically? I don't know that I've seen it, but I
can confirm that UART DMA needs to be disabled for RS485 to work
(which is a more obscure case) which is why I've done it on our
kernels. AFAIK there are still some issues upstream with IMX UART
flow-control and mctrl_gpio.
Retested on 4.9.20 and the prints are gone without any custom patch.
considered as solved.

I'm still in favor to backport your UART DMA patch to fix the RS485 usecase though.
General issues in kernels 4.4 & 4.9
- Even using the latest UBI FS sources + using the Sync option in bootarg,
files can get corrupted on a power cut.  If the corrupted file is a boot
file .. :)
can you point me to documentation on this bootarg, i'm not familiar with it?
Documented here: http://www.linux-mtd.infradead.org/doc/ubifs.html#L_sync_semantics



Enabled using this:

--- a/target/linux/imx6/image/bootscript-ventana
+++ b/target/linux/imx6/image/bootscript-ventana
@@ -50,7 +50,7 @@ if itest.s "x${dtype}" == "xnand" ; then
     mtdparts del rootfs && mtdparts add nand0 - ubi
     echo "mtdparts:${mtdparts}"
     setenv fsload ubifsload
-    setenv root "ubi0:ubi ubi.mtd=2 rootfstype=squashfs,ubifs"
+ setenv root "ubi0:ubi ubi.mtd=2 rootfstype=squashfs,ubifs rootflags=sync"
 else
     echo "Booting from block device ${bootdev}..."
     setenv fsload "${fs}load ${dtype} ${disk}:1"



You can check if it's running by looking for following line in bootlog:
[    2.682056] UBIFS: parse sync



testing this on dozen of boards for weeks reveals that it reduces the empty-file issues with ~70% It will sacrifice some write speed as it just basically disables write caching and flushes all changes to NAND asap.



Other than this it runs pretty stable :)

Tim,

I found 1 more issue on 4.4 & 4.9 kernels:

https://lists.debian.org/debian-arm/2016/02/msg00000.html

I'm also seeing this on 4.4 kernel.
It can take up to a few days before it triggers normally, but I have a setup
running which reproduces this within a few hours.

I've made a patch which increases the timeout in the FEC driver just for
testing .. but it still occurs causing the port to be disabled suddenly.

I've seen reports of this as well but usually it takes days of
activity if/before it happens. The MDIO timeout in FEC is currently
3ms - what did you increase it to and are you certain it makes these
issues go away? Perhaps we need to start a discussion about this on
linux-net. I'm not clear if an MDIO read timeout should cause an
interface to go down (or if some layer should retry). I'm also not
clear why an MDIO read would not complete in 3ms.
I did not state that it was solved/better.

For testing, I increased the timeout to 9ms but the issues still pops up.

I fully agree with you that:
- Each read should 99.99999999% finish with these 3 ms.
- If for whatever reason it doesn't succeed, the iface should not halt operating.

Lets go fishing upstream for this one :)

Tim

--
Koen Vandeputte - Software Developer
koen.vandepu...@ncentric.com | +32499736158


_______________________________________________
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev

Reply via email to