On 29.09.2020 22:35, Heiner Kallweit wrote: > On 29.09.2020 22:08, Hans de Goede wrote: >> Hi, >> >> On 9/29/20 9:07 PM, Petr Tesarik wrote: >>> Hi Heiner (and now also Hans)! >>> >>> @Hans: I'm adding you to this conversation, because you're the author >>> of commit b1e3454d39f99, which seems to break the r8169 driver on a >>> laptop of mine. >> >> Erm, no, as you bi-sected yourself already commit 9f0b54cd167219 >> ("r8169: move switching optional clock on/off to pll power functions") >> >> Broke your laptop, commit b1e3454d39f99 ("clk: x86: add "ether_clk" alias >> for Bay Trail / Cherry Trail") is about 18 months older. >> >>> On Fri, 25 Sep 2020 16:47:54 +0200 >>> Heiner Kallweit <hkallwe...@gmail.com> wrote: >>> >>>> On 25.09.2020 14:56, Petr Tesarik wrote: >>>>> On Fri, 25 Sep 2020 11:52:41 +0200 >>>>> Petr Tesarik <ptesa...@suse.cz> wrote: >>>>> >>>>>> On Fri, 25 Sep 2020 11:44:09 +0200 >>>>>> Heiner Kallweit <hkallwe...@gmail.com> wrote: >>>>>> >>>>>>> On 25.09.2020 10:54, Petr Tesarik wrote: >>>>>> [...] >>>>>>>> Does it make sense to bisect the change that broke the driver for me, >>>>>>>> or should I rather dispose of this waste^Wlaptop in an environmentally >>>>>>>> friendly manner? I mean, would you eventually accept a workaround for >>>>>>>> a few machines with a broken BIOS? >>>>>>>> >>>>>>> If the workaround is small and there's little chance to break other >>>>>>> stuff: then usually yes. >>>>>>> If you can spend the effort to bisect the issue, this would be >>>>>>> appreciated. >>>>>> >>>>>> OK, then I'm going to give it a try. >>>>> >>>>> Done. The system freezes when this commit is applied: >>>>> >>>>> commit 9f0b54cd167219266bd3864570ae8f4987b57520 >>>>> Author: Heiner Kallweit <hkallwe...@gmail.com> >>>>> Date: Wed Jun 17 22:55:40 2020 +0200 >>>>> >>>>> r8169: move switching optional clock on/off to pll power functions >>>>> >>>> This sounds weird. On your system tp->clk should be NULL, >> >> Heiner, why do you say that tp->clk should be NULL on Petr's >> system? Because it is an x86 based system? >> > This was a misunderstanding on my side, I was under the assumption > that ether_clk applies to DT-configured systems only. > >> Some X86 SoCs, specifically, the more tablet oriented Bay and Cherry >> Trail SoCs, which are much more ARM SoC like then other X86 SoCs do >> also use the clock framework and the SoC has a number of external clk >> pins which are typically used by audio codecs and by ethernet chips. >> >>>> making >>>> clk_prepare_enable() et al no-ops. Please check whether tp->clk >>>> is NULL after the call to rtl_get_ether_clk(). >>> >>> This might be part of the issue. On my system tp->clk is definitely not >>> NULL: >>> >>> crash> *rtl8169_private.clk 0xffff9277aca58940 >>> clk = 0xffff9277ac2c82a0 >>> >>> crash> *clk 0xffff9277ac2c82a0 >>> struct clk { >>> core = 0xffff9277aef65d00, >>> dev = 0xffff9277aed000b0, >>> dev_id = 0xffff9277aec60c00 "0000:03:00.2", >>> con_id = 0xffff9277ad04b080 "ether_clk", >>> min_rate = 0, >>> max_rate = 18446744073709551615, >>> exclusive_count = 0, >>> clks_node = { >>> next = 0xffff9277ad2428d8, >>> pprev = 0xffff9277aef65dc8 >>> } >>> } >>> >>> The dev_id corresponds to the Ethernet controller: >>> >>> 03:00.2 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI >>> Express Fast Ethernet controller (rev 06) >>> >>> Looking at clk_find(), it matches this entry in clocks: >>> >>> struct clk_lookup { >>> node = { >>> next = 0xffffffffbc702f40, >>> prev = 0xffff9277bf7190c0 >>> }, >>> dev_id = 0x0, >>> con_id = 0xffff9277bf719524 "ether_clk", >>> clk = 0x0, >>> clk_hw = 0xffff9277ad2427f8 >>> } >>> >>> That's because this kernel is built with CONFIG_PMC_ATOM=y, and looking >>> at the platform initialization code, the "ether_clk" alias is created >>> unconditionally. Hans already added. >> >> Petr, unconditionally is not really correct here, just as claiming >> above that my commit broke things was not really correct either. >> >> I guess this is mostly semantics, but I don't appreciate >> the accusatory tone here. >> >> The code in question binds to a clk-pmc-atom platform_device which >> gets instantiated by drivers/platform/x86/pmc_atom.c. Which in turn >> binds to a PCI device which is only present on Bay Trail and Cherry >> Trail SoCs. >> >> IOW the commit operates as advertised in its Subject: >> "clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail" >> >> So with that all clarified lets try to see if we can figure out >> *why* this is actually happening. >> >> Petr, can you describe your hardware in some more detail, >> in the bits quoted when you first Cc-ed me there is not that >> much detail. What is the vendor and model of your laptop? >> >> Looking closer at commit 9f0b54cd167219 >> ("r8169: move switching optional clock on/off to pll power functions") >> I notice that the functions which now enable/disable the clock: >> rtl_pll_power_up() and rtl_pll_power_down() >> >> Only run when the interface is up during suspend/resume. >> Petr, I guess the laptop is not connected to ethernet when you >> hibernate it? >> >> That means that on resume the clock will not be re-enabled. >> >> This is a subtle but important change and I believe that >> this is what is breaking things. I guess that the PLL which >> rtl_pll_power_up() / rtl_pll_power_down() controls is only >> used for ethernet-timing. But the external clock controlled >> through pt->clk is a replacement for using an external >> crystal with the r8169 chip. So with it disabled, the entire >> chip does not have a clock and is essentially dead. >> It can then e.g. not respond to any pci-e reads/writes done >> to it. >> >> So I believe that the proper fix for this is to revert >> commit 9f0b54cd167219 >> ("r8169: move switching optional clock on/off to pll power functions") >> >> As that caused the whole chip's clock to be left off after >> a suspend/resume while the interface is down. >> >> Also some remarks about this while I'm being a bit grumpy about >> all this anyways (sorry): >> >> 1. 9f0b54cd167219 ("r8169: move switching optional clock on/off >> to pll power functions") commit's message does not seem to really >> explain why this change was made... >> >> 2. If a git blame would have been done to find the commit adding >> the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional >> ether_clk clock") >> then you could have known that the clk in question is an external >> clock for the entire chip, the commit message pretty clearly states >> this (although "the entire" part is implied only) : >> >> "On some boards a platform clock is used as clock for the r8169 chip, >> this commit adds support for getting and enabling this clock (assuming >> it has an "ether_clk" alias set on it). >> > Even if the missing clock would stop the network chip completely, > this shouldn't freeze the system as described by Petr. > In some old RTL8169S spec an external 25MHz clock is mentioned, > what matches the MII bus frequency. Therefore I'm not 100% convinced > that the clock is needed for basic chip operation, but due to > Realtek not releasing datasheets I can't verify this. > > But yes, if reverting this change avoids the issue on Petr's system, > then we should do it. A simple mechanical revert wouldn't work because > source file structure has changed since then, so I'll prepare a patch > that effectively reverts the change. > Uups, the last part was written keeping the original introduction of handling ether_clk in mind. My later change can be simply reverted.
>> This is related to commit d31fd43c0f9a ("clk: x86: Do not gate clocks >> enabled by the firmware") which is a previous attempt to fix this for some >> x86 boards, but this causes all Cherry Trail SoC using boards to not reach >> there lowest power states when suspending. >> >> This commit (together with an atom-pmc-clk driver commit adding the alias) >> fixes things properly by making the r8169 get the clock and enable it when >> it needs it." >> >> Regards, >> >> Hans >> > > Heiner >