On 09.10.2018 22:36, Heiner Kallweit wrote: > On 09.10.2018 16:40, Chris Clayton wrote: >> Thanks to Maciej and Heiner for their replies. >> >> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>> On 07.10.2018 21:36, Chris Clayton wrote: >>>> Hi again, >>>> >>>> I didn't think there was anything in 4.19-rc7 to fix this regression, but >>>> tried it anyway. I can confirm that the >>>> regression is still present and my network still fails when, after a >>>> resume from suspend (to ram or disk), I open my >>>> browser or my mail client. In both those cases the failure is almost >>>> immediate - e.g. my home page doesn't get displayed >>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite so >>>> quickly but the reported time increases from >>>> 14-15ms to more than 1000ms. >>> >>> You can try comparing chip registers (ethtool -d eth0) in the working >>> state (before a suspend) and in the broken state (after a resume). >>> Maybe there will be some obvious in the difference. >>> >>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>> >> Maciej suggested comparing the output from lspci -vv for the ethernet >> device. They are identical. >> >> Both Maciej and Heiner suggested comparing the output from "ethtool -d" pre >> and post suspend. Again, they are identical. >> Heiner specifically suggested looking at the RxConfig. The value of that is >> 0x0002870e both pre and post suspend. >> > Hmm, this is very weird, especially taking into account that in your original > report you state that removing the call to rtl_init_rxcfg() from > rtl_hw_start() > fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and > register values seem to be the same before and after resume. So how can the > chip behave differently? > So far my best guess is that some chip quirk causes it to accept writes to > register RxConfig, but to misinterpret or ignore the written value. > So far your report is the only one (affecting RTL8411), but we don't know > whether other chip versions are affected too.
Also, it is interesting that even if one removes a call to rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get written to moments later by rtl_set_rx_mode(). The only chip accesses in the meantime seems to be a write to TxConfig by rtl_set_tx_config_registers() and then a read of RxConfig plus two writes to MAR0 earlier in rtl_set_rx_mode(). My proposals are: 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" in rtl_hw_start(). Maybe the chip does not like sometimes that RxConfig is written before TxConfig. 2) Check the original value of RxConfig (after a resume) before rtl_init_rxcfg() overwrites it (compile tested only): --- r8169.c.ori +++ r8169.c @@ -5155,6 +5155,9 @@ /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ RTL_R8(tp, IntrMask); RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); + + pr_notice("RxConfig before init was %.8x\n", + (unsigned int)RTL_R32(tp, RxConfig)); rtl_init_rxcfg(tp); rtl_set_tx_config_registers(tp); This should be the value that you got when you removed the call to rtl_init_rxcfg() for testing. Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() writes (under the "default:" label for your NIC model). Hope this helps, Maciej