And one more inquiry .. So far I read about the issue only in combination with NFS. Does the issue also occur with iperf or some other type of high network load?
Heiner On 31.01.2019 07:35, Heiner Kallweit wrote: > Hi David, two more things: > > 1. Could you please test a recent linux-next kernel? > 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19 > and compare them. > > Heiner > > > On 31.01.2019 07:21, Heiner Kallweit wrote: >> David, thanks for the link to the bug ticket. >> I think only a proper bisect can help to find the offending commit. >> >> Heiner >> >> >> On 31.01.2019 03:32, David Chang wrote: >>> Hi, >>> >>> We had a similr case here. >>> - Realtek r8169 receive performance regression in kernel 4.19 >>> https://bugzilla.suse.com/show_bug.cgi?id=1119649 >>> >>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880 >>> The major symptom is there are many rx_missed count. >>> >>> >>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote: >>>> Hi Peter, >>>> >>>> recently I had somebody where pcie_aspm=off for whatever reason didn't >>>> do the trick, can you also check with pcie_aspm.policy=performance. >>> >>> We will give it a try later. >>> >>>> And please check with "ethtool -S <if>" whether the chip statistics >>>> show a significant number of errors. >>>> >>>> If this doesn't help you may have to bisect to find the offending commit. >>> >>> We had tried fallback driver to a few previous commits as following, >>> but with no luck. >>> >>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19) >>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1) >>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1) >>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1) >>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1) >>> >>> Thanks, >>> David Chang >>> >>>> >>>> Heiner >>>> >>>> >>>> On 30.01.2019 10:59, Peter Ceiley wrote: >>>>> Hi Heiner, >>>>> >>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter >>>>> and this made no difference. >>>>> >>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and >>>>> subsequently loaded the module in the running 4.19.18 kernel. I can >>>>> confirm that this immediately resolved the issue and access to the NFS >>>>> shares operated as expected. >>>>> >>>>> I presume this means it is an issue with the r8169 driver included in >>>>> 4.19 onwards? >>>>> >>>>> To answer your last questions: >>>>> >>>>> Base Board Information >>>>> Manufacturer: Alienware >>>>> Product Name: 0PGRP5 >>>>> Version: A02 >>>>> >>>>> ... and yes, the RTL8168 is the onboard network chip. >>>>> >>>>> Regards, >>>>> >>>>> Peter. >>>>> >>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallwe...@gmail.com> >>>>> wrote: >>>>>> >>>>>> Hi Peter, >>>>>> >>>>>> I think the vendor driver doesn't enable ASPM per default. >>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs. >>>>>> Few older systems seem to have issues with ASPM, what kind of >>>>>> system / mainboard are you using? The RTL8168 is the onboard >>>>>> network chip? >>>>>> >>>>>> Rgds, Heiner >>>>>> >>>>>> >>>>>> On 29.01.2019 07:20, Peter Ceiley wrote: >>>>>>> Hi Heiner, >>>>>>> >>>>>>> Thanks, I'll do some more testing. It might not be the driver - I >>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves' >>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is >>>>>>> a good idea. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Peter. >>>>>>> >>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallwe...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Peter, >>>>>>>> >>>>>>>> at a first glance it doesn't look like a typical driver issue. >>>>>>>> What you could do: >>>>>>>> >>>>>>>> - Test the r8169.c from 4.18 on top of 4.19. >>>>>>>> >>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect. >>>>>>>> >>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit. >>>>>>>> >>>>>>>> Any specific reason why you think root cause is in the driver and not >>>>>>>> elsewhere in the network subsystem? >>>>>>>> >>>>>>>> Heiner >>>>>>>> >>>>>>>> >>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote: >>>>>>>>> Hi Heiner, >>>>>>>>> >>>>>>>>> Thanks for getting back to me. >>>>>>>>> >>>>>>>>> No, I don't use jumbo packets. >>>>>>>>> >>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide >>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when >>>>>>>>> establishing a connection and is most notable, for example, on my >>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on >>>>>>>>> larger directories) to list the contents of each directory. Once a >>>>>>>>> transfer begins on a file, I appear to get good bandwidth. >>>>>>>>> >>>>>>>>> I'm unsure of the best scientific data to provide you in order to >>>>>>>>> troubleshoot this issue. Running the following >>>>>>>>> >>>>>>>>> netstat -s |grep retransmitted >>>>>>>>> >>>>>>>>> shows a steady increase in retransmitted segments each time I list the >>>>>>>>> contents of a remote directory, for example, running 'ls' on a >>>>>>>>> directory containing 345 media files did the following using kernel >>>>>>>>> 4.19.18: >>>>>>>>> >>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed >>>>>>>>> the following: >>>>>>>>> real 0m19.867s >>>>>>>>> user 0m0.012s >>>>>>>>> sys 0m0.036s >>>>>>>>> >>>>>>>>> The same command shows no retransmitted segments running kernel >>>>>>>>> 4.18.16 and 'time' showed: >>>>>>>>> real 0m0.300s >>>>>>>>> user 0m0.004s >>>>>>>>> sys 0m0.007s >>>>>>>>> >>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either >>>>>>>>> case. >>>>>>>>> >>>>>>>>> dmesg XID: >>>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g, >>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32 >>>>>>>>> >>>>>>>>> # lspci -vv >>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. >>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) >>>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet >>>>>>>>> Controller >>>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+ >>>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx- >>>>>>>>> Latency: 0, Cache Line Size: 64 bytes >>>>>>>>> Interrupt: pin A routed to IRQ 19 >>>>>>>>> Region 0: I/O ports at d000 [size=256] >>>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K] >>>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K] >>>>>>>>> Capabilities: [40] Power Management version 3 >>>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA >>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+) >>>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- >>>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ >>>>>>>>> Address: 0000000000000000 Data: 0000 >>>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01 >>>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s >>>>>>>>> <512ns, L1 <64us >>>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- >>>>>>>>> SlotPowerLimit 10.000W >>>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- >>>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- >>>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes >>>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ >>>>>>>>> TransPend- >>>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit >>>>>>>>> Latency L0s unlimited, L1 <64us >>>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ >>>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ >>>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- >>>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) >>>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- >>>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, >>>>>>>>> OBFF Via message/WAKE# >>>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS- >>>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, >>>>>>>>> OBFF Disabled >>>>>>>>> AtomicOpsCtl: ReqEn- >>>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- >>>>>>>>> SpeedDis- >>>>>>>>> Transmit Margin: Normal Operating Range, >>>>>>>>> EnterModifiedCompliance- ComplianceSOS- >>>>>>>>> Compliance De-emphasis: -6dB >>>>>>>>> LnkSta2: Current De-emphasis Level: -6dB, >>>>>>>>> EqualizationComplete-, EqualizationPhase1- >>>>>>>>> EqualizationPhase2-, EqualizationPhase3-, >>>>>>>>> LinkEqualizationRequest- >>>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- >>>>>>>>> Vector table: BAR=4 offset=00000000 >>>>>>>>> PBA: BAR=4 offset=00000800 >>>>>>>>> Capabilities: [d0] Vital Product Data >>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error >>>>>>>>> Not readable >>>>>>>>> Capabilities: [100 v1] Advanced Error Reporting >>>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- >>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >>>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- >>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >>>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- >>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- >>>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ >>>>>>>>> AdvNonFatalErr- >>>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- >>>>>>>>> AdvNonFatalErr+ >>>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- >>>>>>>>> ECRCChkCap+ ECRCChkEn- >>>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- >>>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000 >>>>>>>>> Capabilities: [140 v1] Virtual Channel >>>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 >>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- >>>>>>>>> Ctrl: ArbSelect=Fixed >>>>>>>>> Status: InProgress- >>>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- >>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- >>>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 >>>>>>>>> Status: NegoPending- InProgress- >>>>>>>>> Capabilities: [160 v1] Device Serial Number >>>>>>>>> 01-00-00-00-68-4c-e0-00 >>>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting >>>>>>>>> Max snoop latency: 71680ns >>>>>>>>> Max no snoop latency: 71680ns >>>>>>>>> Kernel driver in use: r8169 >>>>>>>>> Kernel modules: r8169 >>>>>>>>> >>>>>>>>> Please let me know if you have any other ideas in terms of testing. >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> Peter. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallwe...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have been experiencing very poor network performance since Kernel >>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver. >>>>>>>>>>> >>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am >>>>>>>>>>> experiencing >>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with >>>>>>>>>>> 4.20.4 & 4.19.18). >>>>>>>>>>> >>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help >>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one >>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms >>>>>>>>>>> differ in that I still have a network connection. I have attempted >>>>>>>>>>> to >>>>>>>>>>> reload the driver on a running system, but this does not improve the >>>>>>>>>>> situation. >>>>>>>>>>> >>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper >>>>>>>>>>> working order. >>>>>>>>>>> >>>>>>>>>>> lshw shows: >>>>>>>>>>> description: Ethernet interface >>>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet >>>>>>>>>>> Controller >>>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd. >>>>>>>>>>> physical id: 0 >>>>>>>>>>> bus info: pci@0000:03:00.0 >>>>>>>>>>> logical name: enp3s0 >>>>>>>>>>> version: 0c >>>>>>>>>>> serial: >>>>>>>>>>> size: 1Gbit/s >>>>>>>>>>> capacity: 1Gbit/s >>>>>>>>>>> width: 64 bits >>>>>>>>>>> clock: 33MHz >>>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list >>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd >>>>>>>>>>> 1000bt-fd autonegotiation >>>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169 >>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25 >>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s >>>>>>>>>>> resources: irq:19 ioport:d000(size=256) >>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff >>>>>>>>>>> >>>>>>>>>>> Kind Regards, >>>>>>>>>>> >>>>>>>>>>> Peter. >>>>>>>>>>> >>>>>>>>>> Hi Peter, >>>>>>>>>> >>>>>>>>>> the description "poor network performance" is quite vague, therefore: >>>>>>>>>> >>>>>>>>>> - Can you provide any measurements? >>>>>>>>>> - iperf results before and after >>>>>>>>>> - statistics about dropped packets (rx and/or tx) >>>>>>>>>> - Do you use jumbo packets? >>>>>>>>>> >>>>>>>>>> Also help would be a "lspci -vv" output for the network card and >>>>>>>>>> the dmesg output line with the chip XID. >>>>>>>>>> >>>>>>>>>> Heiner >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >