Hello Andrew and Florian. I concated both replies into this post.
On Monday, February 4, 2019 11:26:41 PM CET Andrew Lunn wrote: > On Mon, Feb 04, 2019 at 10:35:55PM +0100, Christian Lamparter wrote: > > The QCA8337 enumerates 5 PHYs on the MDC/MDIO access: PHY0-PHY4. > > Based on the System Block Diagram in Section 1.2 of the > > QCA8337's datasheet. These PHYs are internally connected > > to MACs of PORT 1 - PORT 5. > > Hi Christian > > Is the off-by-one the problem here? > Yes, this was triggered by a MII_PHYSID1 and MII_PHYSID2 read for a non-existent PHY at address >0x5< coming from dsa_register_switch() during boot. I added a WARN_ON to qca8k_phy_read() to see from where the reads are coming from: [ 6.218168] WARNING: CPU: 0 PID: 21 at qca8k_phy_read+0x3c/0x68 [ 6.224062] Modules linked in: [ 6.227107] CPU: 0 PID: 21 Comm: kworker/0:1 Tainted: G W 4.19.16 #0 [ 6.234732] Workqueue: events deferred_probe_work_func [ 6.239849] NIP: c0308528 LR: c0308528 CTR: c0257d30 [ 6.244878] REGS: cf485b80 TRAP: 0700 Tainted: G W (4.19.16) [ 6.252064] MSR: 00029000 <CE,EE,ME> CR: 28002222 XER: 00000000 [ 6.258224] [ 6.258224] GPR00: c0308528 cf485c30 cf47c000 00000005 00000000 00000215 c09d61e4 c09d3de0 [ 6.258224] GPR08: 00021000 c09bfb00 c09bfb00 00000007 24002444 00000000 c003f81c cf466060 [ 6.258224] GPR16: ffffffff cf633f60 cf633f6c 00000001 cf633f40 c0589f7c 006080c0 c09bede8 [ 6.258224] GPR24: cf633f40 00000000 00000000 fffff000 00000003 cdf21790 00000003 00000005 [ 6.292952] NIP [c0308528] qca8k_phy_read+0x3c/0x68 [ 6.297808] LR [c0308528] qca8k_phy_read+0x3c/0x68 [ 6.302574] Call Trace: [ 6.305013] [cf485c30] [c0308528] qca8k_phy_read+0x3c/0x68 (unreliable) [ 6.311602] [cf485c50] [c0300530] mdiobus_read+0x6c/0x9c [ 6.316894] [cf485c70] [c02ffdcc] get_phy_device+0x188/0x204 [ 6.322527] [cf485cd0] [c0300740] mdiobus_scan+0x20/0x160 [ 6.327901] [cf485d00] [c0300a3c] __mdiobus_register+0x1bc/0x2a8 [ 6.333884] [cf485d30] [c047e690] dsa_register_switch+0x6a0/0x904 [ 6.339954] [cf485db0] [c0300dd8] mdio_probe+0x40/0x88 [ 6.345070] [cf485dc0] [c026947c] really_probe+0x168/0x300 [ 6.350530] [cf485df0] [c0269b44] driver_probe_device+0x410/0x460 [ 6.356594] [cf485e10] [c02672cc] bus_for_each_drv+0x5c/0xc0 [ 6.362229] [cf485e40] [c0269270] __device_attach+0xa8/0x144 [ 6.367862] [cf485e70] [c0268430] bus_probe_device+0x3c/0xc0 [ 6.373495] [cf485e90] [c02689dc] deferred_probe_work_func+0x70/0xac [ 6.379821] [cf485eb0] [c003a2c4] process_one_work+0x25c/0x3b0 [ 6.385633] [cf485ed0] [c003a708] worker_thread+0x2f0/0x434 [ 6.391180] [cf485f10] [c003f950] kthread+0x134/0x138 [ 6.396209] [cf485f40] [c000d23c] ret_from_kernel_thread+0x14/0x1c [ 6.402357] Instruction dump: [ 6.405313] 93c10018 7cbe2b78 93e1001c 7c9f2378 93a10014 83a30018 4bfffe4d 3d20c058 [ 6.413027] 7c651b78 7fe4fb78 3869c970 4bd4e35d <0fe00000> 80010024 7fc5f378 807d0004 [ 6.420916] ---[ end trace 00aeb6767b21cd36 ]--- If I disable port@5 (see qca8k.txt example) <https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/net/dsa/qca8k.txt#L103> then problem went away (but of course, this makes the WAN port useless). When I looked more into it, I started to notice that the mdiobus_scan started from address >1< (not 0). It was skipping PHY0 (which does exist!) and then the extract from qca8k.txt suddenly made sense: |The integrated switch subnode should be specified according to the binding |described in dsa/dsa.txt. As the QCA8K switches do not have a N:N mapping of |port and PHY id, each subnode describing a port needs to have a valid phandle |referencing the internal PHY connected to it. The CPU port of this switch is |always port 0. >From what I can tell, the slave mdio port-numbers (i.e 0 - 5/6 on the qca8k) are being used directly as phy-addresses on the slave-bus. And since the port0 is a cpu port it gets skipped when the ds->phy_mii_mask is created in dsa_switch_setup(): | ds->phys_mii_mask |= dsa_user_ports(ds); // 0x3E <https://elixir.bootlin.com/linux/v5.0-rc5/source/net/dsa/dsa2.c#L350> However, ds->phys_mii_mask is used to calculate the slave's phy_mask in dsa_slave_mii_bus_init() |ds->slave_mii_bus->phy_mask = ~ds->phys_mii_mask; // ~0x3E <https://github.com/torvalds/linux/blob/master/net/dsa/slave.c#L61> which is then later used by __mdiobus_register() to scan for the supposedly existing PHY at 0x1 - 0x5. | for (i = 0; i < PHY_MAX_ADDR; i++) { | if ((bus->phy_mask & (1 << i)) == 0) { | struct phy_device *phydev; | | phydev = mdiobus_scan(bus, i); | if (IS_ERR(phydev) && (PTR_ERR(phydev) != -ENODEV)) { | err = PTR_ERR(phydev); | goto error; | } | } | } So, I'm looking for a way to get this "-1" somewhere and this version was the best justification I came up with. Because as Florian said, this is supposed to work for different drivers as well. --- On Monday, February 4, 2019 11:11:41 PM CET Florian Fainelli wrote: > On 2/4/19 1:35 PM, Christian Lamparter wrote: > > The QCA8337 enumerates 5 PHYs on the MDC/MDIO access: PHY0-PHY4. > > Based on the System Block Diagram in Section 1.2 of the > > QCA8337's datasheet. These PHYs are internally connected > > to MACs of PORT 1 - PORT 5. However, neither qca8k's slave > > mdio access functions qca8k_phy_read()/qca8k_phy_write() > > nor the dsa framework is set up for that. > > > > This version of the patch uses the existing phy-handle > > properties of each specified DSA Port in the DT to map > > each PORT/MAC to its exposed PHY on the MDIO bus. This > > is supported by the current binding document qca8k.txt > > as well. > > I don't think you should have to do any of this translation, because you > can do a couple of things with DSA/Device Tree: > > - you can not provide a phy-handle property at all, in which case, the > core DSA layer assumes that the PHY is part of the switch's internal > MDIO bus which is implictly created by dsa_slave_mii_bus_create() > > - you can specify a phy-handle property and then the PHY device tree > node can be placed pretty much anywhere in Device Tree, including on a > separate MDIO bus Device Tre node which is "external" to the switch > > In either case, the PHY device's MDIO bus parent and its address are > taken care of by drivers/of/of_mdio.c. You can look at mx88e6xxx for how > it deals with its internal vs. external MDIO bus controller and that > driver is used on a wide variety of cconfiguration. Hm, this sounds to me like the qca8k driver might be cheating a bit. Though, I can't really tell, I found "stub" translation routines in the mv88e6060 driver called mv88e6060_port_to_phy_addr(). But it just checks the range. I think the issue here is that qca8k_phy_read/write don't actually operate on the "internal" mdio-bus of the switch. Instead the mdiobus_read/write in qca8k_phy_read/write operates on the provided mdio-bus (by the ethernet-mac or gpio-mdio, etc...). But let's see what else can be done (maybe qca8k_phy_read/write() is just wrong and should be dropped?). Regards, Christian