On Wed, May 29, 2019 at 04:11:57PM +0000, Ioana Ciornei wrote: > > Subject: [net] 9dd6d07682: kernel_BUG_at_drivers/net/phy/mdio_bus.c > > > > FYI, we noticed the following commit (built with gcc-6): > > > > commit: 9dd6d07682b10a55d1f49d495b85f7b945ff75ca ("[PATCH 10/11] net: > > dsa: Use PHYLINK for the CPU/DSA ports") > > url: > > > > > > in testcase: boot > > > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m > > 2G > > > > caused below changes (please refer to attached dmesg/kmsg for entire > > log/backtrace): > > > > > > +------------------------------------------+------------+------------+ > > | | 3a2573f868 | 9dd6d07682 | > > +------------------------------------------+------------+------------+ > > | boot_successes | 4 | 0 | > > | boot_failures | 0 | 6 | > > | kernel_BUG_at_drivers/net/phy/mdio_bus.c | 0 | 6 | > > | invalid_opcode:#[##] | 0 | 6 | > > | EIP:mdiobus_free | 0 | 6 | > > | Kernel_panic-not_syncing:Fatal_exception | 0 | 6 | > > +------------------------------------------+------------+------------+ > > > > > > If you fix the issue, kindly add following tag > > Reported-by: kernel test robot <rong.a.c...@intel.com> > > > > > > [ 39.031781] kernel BUG at drivers/net/phy/mdio_bus.c:503! > > [ 39.049792] invalid opcode: 0000 [#1] PREEMPT > > [ 39.058345] CPU: 0 PID: 152 Comm: kworker/0:2 Tainted: G > > T 5.2.0- > > rc1-00321-g9dd6d07 #1 > > [ 39.076106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > 1.10.2-1 04/01/2014 > > [ 39.091893] Workqueue: events deferred_probe_work_func > > [ 39.101903] EIP: mdiobus_free+0x21/0x39 > > [ 39.109323] Code: 18 5f ed ff 5b 5e 5f 5d c3 55 89 e5 52 89 45 fc 8b 45 > > fc 8b > > 90 9c 00 00 00 83 fa 01 75 07 e8 9b fa 99 ff eb 1b 83 fa 03 74 02 <0f> 0b > > c7 80 9c > > 00 00 00 04 00 00 00 05 a0 00 00 00 e8 94 58 ed ff > > [ 39.144715] EAX: eff3e008 EBX: eff3a020 ECX: c23a9de9 EDX: 00000002 > > [ 39.156773] ESI: f0403560 EDI: efffbe24 EBP: efffbdf8 ESP: efffbdf4 > > [ 39.168825] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010297 > > [ 39.182048] CR0: 80050033 CR2: 00000000 CR3: 02781000 CR4: 000006b0 > > [ 39.194136] Call Trace: > > [ 39.198932] _devm_mdiobus_free+0xd/0x10 > > [ 39.206573] release_nodes+0x194/0x1ad > > [ 39.213804] devres_release_all+0x37/0x3d > > [ 39.221709] really_probe+0x2b4/0x3a3 > > [ 39.228808] driver_probe_device+0x110/0x14b > > [ 39.237206] __device_attach_driver+0x9d/0xa5 > > [ 39.245622] bus_for_each_drv+0x65/0x77 > > [ 39.253081] __device_attach+0x8f/0x104 > > [ 39.260507] ? driver_allows_async_probing+0x26/0x26 > > [ 39.270161] device_initial_probe+0x14/0x16 > > [ 39.278223] bus_probe_device+0x22/0x64 > > [ 39.285645] deferred_probe_work_func+0x7b/0xa1 > > [ 39.294428] process_one_work+0x1bc/0x2eb > > [ 39.302332] ? process_one_work+0x164/0x2eb > > [ 39.310539] process_scheduled_works+0x1e/0x24 > > [ 39.319218] worker_thread+0x1cb/0x268 > > [ 39.326604] kthread+0xeb/0xf0 > > [ 39.332607] ? process_scheduled_works+0x24/0x24 > > [ 39.341641] ? __kthread_create_on_node+0x128/0x128 > > [ 39.350985] ret_from_fork+0x1e/0x28 > > [ 39.369743] ---[ end trace 2d9c21baf7b99d11 ]--- > > > > Hi, > > Just to give more context onto what's the path that reaches the BUG_ON, here > is the last part of the dmesg before the crash: > > [ 38.772573] dsa-loop fixed-0:1f: DSA mockup driver: 0x1f > [ 38.790366] libphy: dsa slave smi: probed > [ 38.799285] dsa-loop fixed-0:1f lan1 (uninitialized): PHY [dsa-0.0:00] > driver [Generic PHY] > [ 38.815725] dsa-loop fixed-0:1f lan1 (uninitialized): phy: setting > supported 00,00000000,000062c8 advertising 00,00000000,000062c8 > [ 38.856337] dsa-loop fixed-0:1f lan2 (uninitialized): PHY [dsa-0.0:01] > driver [Generic PHY] > [ 38.872670] dsa-loop fixed-0:1f lan2 (uninitialized): phy: setting > supported 00,00000000,000062c8 advertising 00,00000000,000062c8 > [ 38.903821] dsa-loop fixed-0:1f lan3 (uninitialized): PHY [dsa-0.0:02] > driver [Generic PHY] > [ 38.920337] dsa-loop fixed-0:1f lan3 (uninitialized): phy: setting > supported 00,00000000,000062c8 advertising 00,00000000,000062c8 > [ 38.952873] dsa-loop fixed-0:1f lan4 (uninitialized): PHY [dsa-0.0:03] > driver [Generic PHY] > [ 38.969462] dsa-loop fixed-0:1f lan4 (uninitialized): phy: setting > supported 00,00000000,000062c8 advertising 00,00000000,000062c8 > [ 39.002349] could not attach to PHY: -19 > [ 39.010359] dsa-loop fixed-0:1f: failed to setup link for port 0.5 > > The dsa-loop mockup driver is trying to register a switch using information > based on the dsa_chip_data structure rather than on DTS. > This, of course, leads to PHYLINK being unable to successfully call > phylink_of_phy_connect() on a NULL device_node.
Right, phylink_of_phy_connect() is not supposed to be called with a NULL device node, but if it is, it fails gracefully. > Rewinding the stack, we see that dsa_tree_setup() fails and that calling > dsa_tree_remove_switch() should take care of removing/unregistering any > resources previously allocated. > This does not happen. > > Of special interest here is unregistering the MDIO bus which was registered > in dsa_switch_setup(). > The mdiobus_unregister() is never called because it is conditioned by > dts->setup being true, which is set only after _all_ setup steps were > performed successfully. > This leads the code to the part where it tries to free an MDIO bus which was > not unregistered properly, which is exactly this BUG_ON. > > The immediate fix for this is to use PHYLINK on the CPU port only when the > device_node associated is not NULL. > As a separate patch from this series, I will also try to fix the initial bug. Yep, it sounds like a short-coming of the error handling, which should be the first priority to fix; avoiding PHYLINK on the CPU port when the device_node is NULL seems to me like papering over a bug. However, should mention that I have been carrying a patch for some time for ZII boards to disable the error path for "no PHY" in dsa_slave_phy_setup() - iow, when phylink_of_phy_connect() returns -ENODEV and dsa_slave_phy_connect() also errors out. That was the only way I could get the SFF modules to work there. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up