Begin forwarded message: Date: Sat, 15 Sep 2018 08:43:09 +0000 From: bugzilla-dae...@bugzilla.kernel.org To: step...@networkplumber.org Subject: [Bug 201137] New: using traffic control with sfq cause kernel crash https://bugzilla.kernel.org/show_bug.cgi?id=201137 Bug ID: 201137 Summary: using traffic control with sfq cause kernel crash Product: Networking Version: 2.5 Kernel Version: 4.18.5 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: IPV4 Assignee: step...@networkplumber.org Reporter: grafgrim...@gmx.de Regression: No Created attachment 278555 --> https://bugzilla.kernel.org/attachment.cgi?id=278555&action=edit kernel config Copying from the machine to an other server (protocol does not matter), causes a kernel crash when using tc-setting with SFQ. The machine has a Qualcom Killer NIC: lspci |grep Killer 03:00.0 Ethernet controller: Qualcomm Atheros Killer E220x Gigabit Ethernet Controller (rev 13) I use traffic control with SFQ: tc qdisc add dev enp3s0 root handle 1: sfq tc qdisc show dev enp3s0 Now I try to copy a big file (124GB, an image of a partition) to another Linux-Server (same kernel version) to a NFS-Share. It does not matter if it is a nfs or samba or whatever-share. It also does not matter if I use cp or rsync command. The target-share is for example: grep base /proc/mounts jaguar.grafnetz:/base /mnt/base nfs4 rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.9,local_lock=none,addr=192.168.0.7 0 0 df shows this nfs-share called base when mounted: jaguar.grafnetz:/base 11718572032 6012592128 5705979904 52% /mnt/base Now I use a simpe cp-command: cp big-fime.dd.image /mnt/base/test_01 The machine crashes after 7833735168 Bytes reached the Target-Server. About 7,9 GB (with G=1000^3). I can reproduce this crash. The good thing is: I figured out that no kernel crash happens when I do not use: tc qdisc add dev enp3s0 root handle 1: sfq tc qdisc show dev enp3s0 (So I commented it out from my local start-script and rebootet the system.) Result: No crash any more. Copying the big file (124GB) completed without a kernel crash. Additional Information... NIC is configured with IPv4: haswell ~ # ifconfig enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.0.9 netmask 255.255.255.0 broadcast 192.168.0.255 ether d4:3d:7e:bd:89:44 txqueuelen 1000 (Ethernet) RX packets 7399483 bytes 511559908 (487.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 91781850 bytes 47176316774 (43.9 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 19 ethtool enp3s0 Settings for enp3s0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on MDI-X: Unknown Current message level: 0x000060e4 (24804) link ifup rx_err tx_err hw wol Link detected: yes While copying over the Gigabit-Network, speed is near maximum: ifstat enp2s0 KB/s in KB/s out 0.06 0.18 8348.65 31.60 117536.2 435.11 118049.0 435.04 119100.9 434.84 118889.7 435.19 119004.1 444.53 119061.4 440.47 119102.8 444.04 119077.4 444.39 119084.1 432.32 119089.6 439.71 [...] So, perhaps the sfq-Kernel-module has a bug. I use the vanilla kernel from kernel.org and sfq is compiled as a module. /usr/src/linux # grep SFQ .config CONFIG_NET_SCH_SFQ=m Perhaps important: the server with the target-share also uses sfq with the same settings without a problem. It runs stable. -- You are receiving this mail because: You are the assignee for the bug.