> On 18. Oct 2019, at 14:57, Paul <de...@ukr.net> wrote: > > Our current version is: > > FreeBSD 11.2-STABLE #0 r340725 > > New version that we have problems with: > > FreeBSD 12.1-STABLE #5 r352893 > > > After update to new version we have started to observe an incredible number > of > errors in HTTP requests in between various services in our system. This > problem > appeared on all the servers that were upgraded, and seems to not be specific > to > concrete network card: we use different models, all are affected. > > During various tests, we observed a lot of spontaneous TCP stream abortions, > including at the establishment stage (SYN) in cases that were 100% issue free > on 11.2-STABLE. Concrete test cases will be shown below. > > We also want to highlight that, on numerous occasions, we have observed > random, > huge ACK indices in a first response to a SYN packet, instead of 1, as > expected. > This forces client to abort connection via RST. > > On the fist glance it looks like races in the kernel, because problem > disappears when: > * we use `dev.ixl.0.iflib.override_nrxqs=1` and > `dev.ixl.0.iflib.override_ntxqs=1` > * we use `dev.ixl.0.iflib.override_nrxqs=0` and > `dev.ixl.0.iflib.override_ntxqs=0`, but don't issue concurrent TCP streams > > These are some debug log messages, emitted by 12.1-STABLE: > > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16304 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache > entry (possibly syncookie only), segment ignored > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16326 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache > entry (possibly syncookie only), segment ignored > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16402 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache > entry (possibly syncookie only), segment ignored > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16652 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache > entry (possibly syncookie only), segment ignored > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16686 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache > entry (possibly syncookie only), segment ignored > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:18562 to [10.10.10.92]:80 > tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, no action > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:18918 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache > entry (possibly syncookie only), segment ignored > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19331 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache > entry (possibly syncookie only), segment ignored > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19340 to [10.10.10.92]:80 > tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, no action > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19340 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache > entry (possibly syncookie only), segment ignored > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19340 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache > entry (possibly syncookie only), segment ignored > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19489 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache > entry (possibly syncookie only), segment ignored > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19580 to [10.10.10.92]:80 > tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, no action > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19580 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache > entry (possibly syncookie only), segment ignored > Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19580 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache > entry (possibly syncookie only), segment ignored > Oct 18 14:59:02 test kernel: TCP: [10.10.10.39]:17705 to [10.10.10.92]:80; > syncache_timer: Response timeout, retransmitting (1) SYN|ACK > Oct 18 14:59:02 test kernel: TCP: [10.10.10.39]:18066 to [10.10.10.92]:80; > syncache_timer: Response timeout, retransmitting (1) SYN|ACK > Oct 18 14:59:02 test kernel: TCP: [10.10.10.39]:18066 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Our SYN|ACK was rejected, connection > attempt aborted by remote endpoint > Oct 18 14:59:02 test kernel: TCP: [10.10.10.39]:17705 to [10.10.10.92]:80 > tcpflags 0x4<RST>; syncache_chkrst: Our SYN|ACK was rejected, connection > attempt aborted by remote endpoint > > Here, 10.10.10.92 runs 12.1-STABLE, while 10.10.10.39 is a client that runs > 11.2-STABLE. > > > In our test case we use nginx and wrk , with a minimal config, where nginx > always returns > error page 404. nginx is on the 12.1-STABLE, while wrk is on 11.2-STABLE. > > We run wrk like so: > > wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency > http://10.10.10.92:80/missing > > and often see errors like these: > > Socket errors: connect 12, read 4, write 4, timeout 0 > > If we reverse the test, by switching two servers places, ie 12.1-STABLE > becomes a client and > issues requests via wrk, we see no problems at all. Same is true between two > between two > 11.2-STABLE machines. > > > It seems like issue appears only when the same local port is used for > multiple connections > on 12.1-STABLE. Currently this is possible only when 12.1-STABLE is a server > and accepts > connections on port, say 80, as in our case. To confirm, this we made > another test. We've > configured nginx to listen on 10 different ports, 80 through 89, and then > launched 10 > different wrk processes, each using only one concurrent connection, meaning > that we will > have only 10 TCP streams, each having its own unique port on the > 12.1-STABLE's side: > > for I in {0..9}; do wrk -c 1 --header "Connection: close" -d 10 -t 1 > --latency http://10.10.10.92:8${I}/missing & ; done > > Socket errors stopped appearing. We ran this test many many times, errors > just don't appear. > > Though, whenever we repeat a previous test, using a single port: > > wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency > http://10.10.10.92:80/missing > > errors start appearing again and again: > > Socket errors: connect 8, read 14, write 9, timeout 0 > > > We've tested different drivers with the same outcome: > > em driver: > em0@pci0:10:0:0: class=0x020000 card=0x000015d9 chip=0x10d38086 > rev=0x00 hdr=0x00 > vendor = 'Intel Corporation' > device = '82574L Gigabit Network Connection' > > ixl driver: > ixl0@pci0:4:0:0: class=0x020000 card=0x00078086 chip=0x15728086 > rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = 'Ethernet Controller X710 for 10GbE SFP+' > > Even the driver from ports (/usr/ports/net/intel-ixl-kmod): ixl-1.11.9 > > > Help with this matter would be really appreciated. I would like to reproduce this locally.
Could you send me (privately) the config of nginx such that I can setup two machines? Are your client/server physical machines or virtual machines? Are there any middleboxes (NAT/Firewall/whatever) involved? One thing (no idea if it is relevant or not): Could you set sudo sysctl -w net.inet.tcp.ts_offset_per_conn=0 on the 12.1 machine and test and report if it helps? Best regards Michael > > Best regards, > -Paul > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"