On Wed, Dec 19, 2018 at 2:22 AM Heiner Kallweit <hkallwe...@gmail.com> wrote: > > On 18.12.2018 14:25, Chris Chiu wrote: > > On Tue, Dec 18, 2018 at 3:08 AM Heiner Kallweit <hkallwe...@gmail.com> > > wrote: > >> > >> On 17.12.2018 14:25, Chris Chiu wrote: > >>> On Fri, Dec 14, 2018 at 3:37 PM Heiner Kallweit <hkallwe...@gmail.com> > >>> wrote: > >>>> > >>>> On 14.12.2018 04:33, Chris Chiu wrote: > >>>>> On Thu, Dec 13, 2018 at 10:20 AM Chris Chiu <c...@endlessm.com> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> We got an acer laptop which has a problem with ethernet networking > >>>>>> after > >>>>>> resuming from S3. The ethernet is popular realtek r8168. The lspci > >>>>>> shows as > >>>>>> follows. > >>>>>> 02:00.1 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. > >>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] > >>>>>> (rev 12) > >>>>>> > >>>> Helpful would be a "dmesg | grep r8169", especially chip name + XID. > >>>> > >>> [ 22.362774] r8169 0000:02:00.1 (unnamed net_device) > >>> (uninitialized): mac_version = 0x2b > >>> [ 22.365580] libphy: r8169: probed > >>> [ 22.365958] r8169 0000:02:00.1 eth0: RTL8411, 00:e0:b8:1f:cb:83, > >>> XID 5c800800, IRQ 38 > >>> [ 22.365961] r8169 0000:02:00.1 eth0: jumbo features [frames: 9200 > >>> bytes, tx checksumming: ko] > >>> > >> Thanks for the info. > >> > >>>>>> The problem is the ethernet is not accessible after resume. > >>>>>> Pinging via > >>>>>> ethernet always shows the response `Destination Host Unreachable`. > >>>>>> However, > >>>>>> the interesting part is, when I run tcpdump to monitor the problematic > >>>>>> ethernet > >>>>>> interface, the networking is back to alive. But it's dead again after > >>>>>> I stop tcpdump. > >>>>>> One more thing, if I ping the problematic machine from others, it > >>>>>> achieves the > >>>>>> same effect as above tcpdump. Maybe it's about the register setting > >>>>>> for RX path? > >>>>>> > >>>> You could compare the register dumps (ethtool -d) before and after S3 > >>>> sleep > >>>> to find out whether there's a difference. > >>>> > >>> > >>> Actually, I just found I lead the wrong direction. The S3 suspend does > >>> help to reproduce, > >>> but it's not necessary. All I need to do is ping around 5 mins and the > >>> network connection > >>> fails. And I also find one thing interesting, disabling the MSI-X > >>> interrupt like commit > >>> [d49c88d7677ba737e9d2759a87db0402d5ab2607] can fix this problem. > >>> Although I don't > >>> understand the root cause. Anything I can do to help? > >>> > >> This is indeed very, very weird. You say switching from MSI-X to MSI fixes > >> the issue, but also pinging the machine from outside brings back the > >> network. > >> Both actions affect totally different corners. > >> > >> The commit and related issue you mention was a workaround in the driver, > >> the root cause was a MSI-X-related issue with certain Intel chipsets deep > >> in the PCI core. After this was fixed we removed the workaround again. > >> This shouldn't be related to your issue. > >> > >> Hard to say for now is whether the issue is: > >> - a driver issue > >> - a hardware issue in the RTL8411 > >> - an issue with the chipset on your mainboard > >> > >> According to your description it doesn't take a special scenario to trigger > >> the issue, so most likely also other users of Acer notebooks with RTL8411 > >> should be affected (after briefly checking this should be at least Aspire > >> F15, V15, V7). Therefore I wonder why there aren't more reports. > >> > >> This commit added MSI-X support: 6c6aa15fdea5 ("r8169: improve interrupt > >> handling") > >> So you could test this revision and the one before. > >> > >> Eventually, if the issue really should be caused by a side effect of using > >> MSI-X, then the question is whether we need to disable MSI-X for RTL8411 > >> in general or just for RTL8411 and a certain subsystem id. > >> > > > > I tried the kernel with the head on 6c6aa15fdea5 ("r8169: improve > > interrupt handling"), > > the problem still there. Then I revert to the previous revision, the > > problem goes away. > > So I think it's pretty much the side effect of MSI-X. However, as you > > mentioned that > > you didn't hit this problem, I'll ask the vendor to verify if this > > problem also happens on > > other machines with the same chip. Then we can determine to disable for > > specific > > mac version or just a certain subsystem id. > > > Thanks a lot for testing. OK, I have one more idea. > AFAICS RTL8411 also has an integrated card reader controller which is driven > by module rtsx_pci. Maybe if both components (card reader controller + > ethernet) > use different interrupt types, RTL8411 can't properly handle this. > In case module rtsx_pci is loaded on your system, can you check whether not > loading it (e.g. by blacklisting) or removing it makes a difference? >
I boot my kernel with rtsx_pci_ms/rtsx_pci on blacklist, but it doesn't change anything. > Can you provide the "lspci -v" output for the card reader part of RTL8411? 02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTL8411B PCI Express Card Reader (rev 01) Subsystem: Acer Incorporated [ALI] RTL8411B PCI Express Card Reader Flags: bus master, fast devsel, latency 0, IRQ 34 Memory at f0b05000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at f0b10000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [b0] MSI-X: Enable- Count=1 Masked- Capabilities: [d0] Vital Product Data Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00 Capabilities: [170] Latency Tolerance Reporting Capabilities: [178] L1 PM Substates Kernel driver in use: rtsx_pci Kernel modules: rtsx_pci > > >>>>>> I tried the latest 4.20 rc version but the problem still there. I > >>>>>> also tried some > >>>>>> hw_reset or init thing in the resume path but no effect. Any > >>>>>> suggestion for this? > >>>>>> Thanks > >>>>>> > >>>> Did previous kernel versions work? If it's a regression, a bisect would > >>>> be > >>>> appreciated, because with the chip versions I've got I can't reproduce > >>>> the issue. > >>>> > >>>>>> Chris > >>>>> > >>>>> Gentle ping. Any additional information required? > >>>>> > >>>>> Chris > >>>>> > >>>> Heiner > >>> > >> > > >