[Bug 254965] re0 ethernet connection fails with thousands of "phy read failed" errors in system message
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254965 Mark Linimon changed: What|Removed |Added Assignee|b...@freebsd.org|n...@freebsd.org -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: NFS Mount Hangs
> On 10. Apr 2021, at 23:59, Rick Macklem wrote: > > tue...@freebsd.org wrote: >> Rick wrote: > [stuff snipped] With r367492 you don't get the upcall with the same error state? Or you don't get an error on a write() call, when there should be one? >> If Send-Q is 0 when the network is partitioned, after healing, the krpc sees >> no activity on >> the socket (until it acquires/processes an RPC it will not do a sosend()). >> Without the 6minute timeout, the RST battle goes on "forever" (I've never >> actually >> waited more than 30minutes, which is close enough to "forever" for me). >> --> With the 6minute timeout, the "battle" stops after 6minutes, when the >> timeout >> causes a soshutdown(..SHUT_WR) on the socket. >> (Since the soshutdown() patch is not yet in "main". I got comments, but >> no "reviewed" >> on it, the 6minute timer won't help if enabled in main. The soclose() >> won't happen >> for TCP connections with the back channel enabled, such as Linux >> 4.1/4.2 ones.) >> I'm confused. So you are saying that if the Send-Q is empty when you >> partition the >> network, and the peer starts to send SYNs after the healing, FreeBSD responds >> with a challenge ACK which triggers the sending of a RST by Linux. This RST >> is >> ignored multiple times. >> Is that true? Even with my patch for the the bug I introduced? > Yes and yes. > Go take another look at linuxtofreenfs.pcap > ("fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap"; if you don't > already have it.) > Look at packet #1949->2069. I use wireshark, but you'll have your favourite. > You'll see the "RST battle" that ends after > 6minutes at packet#2069. If there is no 6minute timeout enabled in the > server side krpc, then the battle just continues (I once let it run for about > 30minutes before giving up). The 6minute timeout is not currently enabled > in main, etc. Hmm. I don't understand why r367492 can impact the processing of the RST, which basically destroys the TCP connection. Richard: Can you explain that? Best regards Michael > >> What version of the kernel are you using? > "main" dated Dec. 23, 2020 + your bugfix + assorted NFS patches that > are not relevant + 2 small krpc related patches. > --> The two small krpc related patches enable the 6minute timeout and > add a soshutdown(..SHUT_WR) call when the 6minute timeout is > triggered. These have no effect until the 6minutes is up and, without > them the "RTS battle" goes on forever. > > Add to the above a revert of r367492 and the RST battle goes away and things > behave as expected. The recovery happens quickly after the network is > unpartitioned, with either 0 or 1 RSTs. > > rick > ps: Once the irrelevant NFS patches make it into "main", I will upgrade to > main bits-de-jur for testing. > > Best regards > Michael >> >> If Send-Q is non-empty when the network is partitioned, the battle will not >> happen. >> >>> >>> My understanding is that he needs this error indication when calling >>> shutdown(). >> There are several ways the krpc notices that a TCP connection is no longer >> functional. >> - An error return like EPIPE from either sosend() or soreceive(). >> - A return of 0 from soreceive() with no data (normal EOF from other end). >> - A 6minute timeout on the server end, when no activity has occurred on the >> connection. This timer is currently disabled for NFSv4.1/4.2 mounts in >> "main", >> but I enabled it for this testing, to stop the "RST battle goes on forever" >> during testing. I am thinking of enabling it on "main", but this crude >> bandaid >> shouldn't be thought of as a "fix for the RST battle". >> From what you describe, this is on writes, isn't it? (I'm asking, at the original problem that was fixed with r367492, occurs in the read path (draining of ths so_rcv buffer in the upcall right away, which subsequently influences the ACK sent by the stack). I only added the so_snd buffer after some discussion, if the WAKESOR shouldn't have a symmetric equivalent on WAKESOW Thus a partial backout (leaving the WAKESOR part inside, but reverting the WAKESOW part) would still fix my initial problem about erraneous DSACKs (which can also lead to extremely poor performance with Linux clients), but possible address this issue... Can you perhaps take MAIN and apply https://reviews.freebsd.org/D29690 for the revert only on the so_snd upcall? >> Since the krpc only uses receive upcalls, I don't see how reverting the send >> side would have >> any effect? >> >>> Since the release of 13.0 is almost done, can we try to fix the issue >>> instead of reverting the commit? >> I think it has already shipped broken. >> I don't know if an errata is possible, or if it will be broken until 13.1. >> >> --> I am much more concerned with the otis@ stuck client problem than this >> RST battle that only >>
Re: NFS Mount Hangs
>From what i understand rick stating around the socket state changing before >the upcall, i can only speculate that the rst fight is for the new sessios the >client tries with the same 5tuple, while server side the old original session >persists, as the nfs server never closes /shutdown the session . But a debug logged version of the socket upcall used by the nfs server should reveal any differences in socket state at the time of upcall. I would very much like to know if d29690 addresses that problem (if it was due to releasing the lock before the upcall), or if that still shows differences between prior to my central upcall change, post that change and with d29690 ... Von: tue...@freebsd.org Gesendet: Sunday, April 11, 2021 2:30:09 PM An: Rick Macklem Cc: Scheffenegger, Richard ; Youssef GHORBAL ; freebsd-net@freebsd.org Betreff: Re: NFS Mount Hangs NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. > On 10. Apr 2021, at 23:59, Rick Macklem wrote: > > tue...@freebsd.org wrote: >> Rick wrote: > [stuff snipped] With r367492 you don't get the upcall with the same error state? Or you don't get an error on a write() call, when there should be one? >> If Send-Q is 0 when the network is partitioned, after healing, the krpc sees >> no activity on >> the socket (until it acquires/processes an RPC it will not do a sosend()). >> Without the 6minute timeout, the RST battle goes on "forever" (I've never >> actually >> waited more than 30minutes, which is close enough to "forever" for me). >> --> With the 6minute timeout, the "battle" stops after 6minutes, when the >> timeout >> causes a soshutdown(..SHUT_WR) on the socket. >> (Since the soshutdown() patch is not yet in "main". I got comments, but >> no "reviewed" >> on it, the 6minute timer won't help if enabled in main. The soclose() >> won't happen >> for TCP connections with the back channel enabled, such as Linux >> 4.1/4.2 ones.) >> I'm confused. So you are saying that if the Send-Q is empty when you >> partition the >> network, and the peer starts to send SYNs after the healing, FreeBSD responds >> with a challenge ACK which triggers the sending of a RST by Linux. This RST >> is >> ignored multiple times. >> Is that true? Even with my patch for the the bug I introduced? > Yes and yes. > Go take another look at linuxtofreenfs.pcap > ("fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap"; if you don't > already have it.) > Look at packet #1949->2069. I use wireshark, but you'll have your favourite. > You'll see the "RST battle" that ends after > 6minutes at packet#2069. If there is no 6minute timeout enabled in the > server side krpc, then the battle just continues (I once let it run for about > 30minutes before giving up). The 6minute timeout is not currently enabled > in main, etc. Hmm. I don't understand why r367492 can impact the processing of the RST, which basically destroys the TCP connection. Richard: Can you explain that? Best regards Michael > >> What version of the kernel are you using? > "main" dated Dec. 23, 2020 + your bugfix + assorted NFS patches that > are not relevant + 2 small krpc related patches. > --> The two small krpc related patches enable the 6minute timeout and > add a soshutdown(..SHUT_WR) call when the 6minute timeout is > triggered. These have no effect until the 6minutes is up and, without > them the "RTS battle" goes on forever. > > Add to the above a revert of r367492 and the RST battle goes away and things > behave as expected. The recovery happens quickly after the network is > unpartitioned, with either 0 or 1 RSTs. > > rick > ps: Once the irrelevant NFS patches make it into "main", I will upgrade to > main bits-de-jur for testing. > > Best regards > Michael >> >> If Send-Q is non-empty when the network is partitioned, the battle will not >> happen. >> >>> >>> My understanding is that he needs this error indication when calling >>> shutdown(). >> There are several ways the krpc notices that a TCP connection is no longer >> functional. >> - An error return like EPIPE from either sosend() or soreceive(). >> - A return of 0 from soreceive() with no data (normal EOF from other end). >> - A 6minute timeout on the server end, when no activity has occurred on the >> connection. This timer is currently disabled for NFSv4.1/4.2 mounts in >> "main", >> but I enabled it for this testing, to stop the "RST battle goes on forever" >> during testing. I am thinking of enabling it on "main", but this crude >> bandaid >> shouldn't be thought of as a "fix for the RST battle". >> From what you describe, this is on writes, isn't it? (I'm asking, at the original problem that was fixed with r367492, occurs in the read path (draining of ths so_rcv buffer in the upcall right away, which subsequent
How to support QUIC with ipfw
Hi, all. I noticed my firewall was dropping what seemed to be unsolicited UDP connections from Google and Facebook, but this turned out to be QUIC traffic. The traffic can be initiated by the browser (or other supporting software) or the server. The problem is that dynamic rules generally don't cut it – udp traffic here is predominantly NTP and DNS, and the dynamic rule lifetime for UDP is very short (3-6 s). And of course they don't work at all for traffic initiated by the server side. My kludgy solution at present is to troll the dynamic rules, locate the TCP connections in them with 443 and 5228 as the target port, and add those addresses to a table that permits UDP traffic from those ports. I only see QUIC on IPv6, by the way. The cron job runs once per minute, adds the addresses seen, and deletes those older than N seconds. I use time_t seconds since epoch as the table arg, so I know when it was added or refreshed. Any suggestions on a better solution? Thanks. – M -- "Well," Brahmā said, "even after ten thousand explanations, a fool is no wiser, but an intelligent person requires only two thousand five hundred." - The Mahābhārata ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Problem reports for n...@freebsd.org that need special attention
To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status |Bug Id | Description +---+--- In Progress |221146 | [ixgbe] Problem with second laggport New |204438 | setsockopt() handling of kern.ipc.maxsockbuf limi New |213410 | [carp] service netif restart causes hang only whe Open| 7556 | ppp: sl_compress_init() will fail if called anyth Open|166724 | if_re(4): watchdog timeout Open|193452 | Dell PowerEdge 210 II -- Kernel panic bce (broadc Open|194453 | dummynet(4): pipe config bw parameter limited to Open|200319 | Bridge+CARP crashes/freezes Open|202510 | [CARP] advertisements sourced from CARP IP cause Open|207261 | netmap: Doesn't do TX sync with kqueue Open|217978 | dhclient: Support supersede statement for option Open|73 | igb(4): Kernel panic (fatal trap 12) due to netwo Open|225438 | panic in6_unlink_ifa() due to race Open|227720 | Kernel panic in ppp server Open|230807 | if_alc(4): Driver not working for Killer Networki Open|236888 | ppp daemon: Allow MTU to be overridden for PPPoE Open|236983 | bnxt(4) VLAN not operational unless explicit "ifc Open|237072 | netgraph(4): performance issue [on HardenedBSD]? Open|237840 | Removed dummynet dependency on ipfw Open|238324 | Add XG-C100C/AQtion AQC107 10GbE NIC driver Open|238707 | Lock order reversal: rtentry vs "nd6 list" Open|241106 | tun/ppp: panic: vm_fault: fault on nofault entry Open|241162 | Panic in closefp() triggered by nginx (uwsgi with Open|243463 | ix0: Watchdog timeout Open|244066 | divert: Add sysctls for divert socket send and re Open|118111 | rc: network.subr Add MAC address based interface 26 problems total for which you should take action. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: How to support QUIC with ipfw
Hi Michael, On Sun, Apr 11, 2021, 1:25 PM Michael Sierchio wrote: > Hi, all. I noticed my firewall was dropping what seemed to be unsolicited > UDP connections from Google and Facebook, but this turned out to be QUIC > traffic. The traffic can be initiated by the browser (or other supporting > software) or the server. The problem is that dynamic rules generally don't > cut it – udp traffic here is predominantly NTP and DNS, and the dynamic > rule lifetime for UDP is very short (3-6 s). And of course they don't work > at all for traffic initiated by the server side. > QUIC connections aren't initiated by the server. The browser is initiating these connections. I'm not an ipfw user, the best generic firewall strategy would be to have some sort of flow tracking for ~30s for UDP flows associated with tuples originating on the client for remote port 443. 443 will cover the vast majority of Internet cases, as QUIC is only being used at scale for HTTP/3. > My kludgy solution at present is to troll the dynamic rules, locate the TCP > connections in them with 443 and 5228 as the target port, and add those > addresses to a table that permits UDP traffic from those ports. I only see > QUIC on IPv6, by the way. The cron job runs once per minute, adds the > addresses seen, and deletes those older than N seconds. I use time_t > seconds since epoch as the table arg, so I know when it was added or > refreshed. > > Any suggestions on a better solution? > > Thanks. > > – M > > -- > > "Well," Brahmā said, "even after ten thousand explanations, a fool is no > wiser, but an intelligent person requires only two thousand five hundred." > > - The Mahābhārata > ___ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > Matt Joras > ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: How to support QUIC with ipfw
On Sun, Apr 11, 2021 at 2:20 PM Matt Joras wrote: > Hi Michael, > > On Sun, Apr 11, 2021, 1:25 PM Michael Sierchio wrote: > >> Hi, all. I noticed my firewall was dropping what seemed to be unsolicited >> UDP connections from Google and Facebook, but this turned out to be QUIC >> traffic. The traffic can be initiated by the browser (or other supporting >> software) or the server. The problem is that dynamic rules generally >> don't >> cut it – udp traffic here is predominantly NTP and DNS, and the dynamic >> rule lifetime for UDP is very short (3-6 s). And of course they don't >> work >> at all for traffic initiated by the server side. >> > > QUIC connections aren't initiated by the server. The browser is initiating > these connections. I'm not an ipfw user, the best generic firewall strategy > would be to have some sort of flow tracking for ~30s for UDP flows > associated with tuples originating on the client for remote port 443. 443 > will cover the vast majority of Internet cases, as QUIC is only being used > at scale for HTTP/3. > > Hej, Matt. Thanks. That's a solution that occurred to me, but it means a ton of dynamic rules will get instantiated for ephemeral DNS lookups – 3 seconds is a very long time for a conversation with a DNS server, because it has probably recursed from the root zone all the way to the A record in a fraction of that time. 30 seconds is forever – well, since UDP doesn't have an analogue to a FIN or RST, the rule doesn't go away when the conversation does. I'll get some metrics on it. Thanks again. -- "Well," Brahmā said, "even after ten thousand explanations, a fool is no wiser, but an intelligent person requires only two thousand five hundred." - The Mahābhārata ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: How to support QUIC with ipfw
Sadly, no. That would be a great feature. The sysctl setting for dynamic rule lifetime is for all UDP. But since the firewall itself is responsible for most of the DNS and NTP traffic, I can write non-stateful rules for that. The recursive resolver on that port won't respond to outside queries for DNS, and NTP ignores commands from strangers. On Sun, Apr 11, 2021 at 2:32 PM Matt Joras wrote: > Hi Michael, > > On Sun, Apr 11, 2021 at 2:27 PM Michael Sierchio > wrote: > > > > On Sun, Apr 11, 2021 at 2:20 PM Matt Joras wrote: > > > > > Hi Michael, > > > > > > On Sun, Apr 11, 2021, 1:25 PM Michael Sierchio > wrote: > > > > > >> Hi, all. I noticed my firewall was dropping what seemed to be > unsolicited > > >> UDP connections from Google and Facebook, but this turned out to be > QUIC > > >> traffic. The traffic can be initiated by the browser (or other > supporting > > >> software) or the server. The problem is that dynamic rules generally > > >> don't > > >> cut it – udp traffic here is predominantly NTP and DNS, and the > dynamic > > >> rule lifetime for UDP is very short (3-6 s). And of course they don't > > >> work > > >> at all for traffic initiated by the server side. > > >> > > > > > > QUIC connections aren't initiated by the server. The browser is > initiating > > > these connections. I'm not an ipfw user, the best generic firewall > strategy > > > would be to have some sort of flow tracking for ~30s for UDP flows > > > associated with tuples originating on the client for remote port 443. > 443 > > > will cover the vast majority of Internet cases, as QUIC is only being > used > > > at scale for HTTP/3. > > > > > > > > Hej, Matt. Thanks. That's a solution that occurred to me, but it means a > > ton of dynamic rules will get instantiated for ephemeral DNS lookups – 3 > > seconds is a very long time for a conversation with a DNS server, because > > it has probably recursed from the root zone all the way to the A record > in > > a fraction of that time. 30 seconds is forever – well, since UDP doesn't > > have an analogue to a FIN or RST, the rule doesn't go away when the > > conversation does. > > Is it not possible to do the dynamic rule instantiation for select UDP > ports, i.e. 443? That may cause issues if DNS-over-HTTP/3 becomes a > thing, but at least for now it would exclude DNS. > > > > > I'll get some metrics on it. Thanks again. > > > > > > -- > > > > "Well," Brahmā said, "even after ten thousand explanations, a fool is no > > wiser, but an intelligent person requires only two thousand five > hundred." > > > > - The Mahābhārata > > Matt Joras > -- "Well," Brahmā said, "even after ten thousand explanations, a fool is no wiser, but an intelligent person requires only two thousand five hundred." - The Mahābhārata ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: NFS Mount Hangs
I should be able to test D69290 in about a week. Note that I will not be able to tell if it fixes otis@'s hung Linux client problem. rick From: Scheffenegger, Richard Sent: Sunday, April 11, 2021 12:54 PM To: tue...@freebsd.org; Rick Macklem Cc: Youssef GHORBAL; freebsd-net@freebsd.org Subject: Re: NFS Mount Hangs CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to ith...@uoguelph.ca >From what i understand rick stating around the socket state changing before >the upcall, i can only speculate that the rst fight is for the new sessios the >client tries with the same 5tuple, while server side the old original session >persists, as the nfs server never closes /shutdown the session . But a debug logged version of the socket upcall used by the nfs server should reveal any differences in socket state at the time of upcall. I would very much like to know if d29690 addresses that problem (if it was due to releasing the lock before the upcall), or if that still shows differences between prior to my central upcall change, post that change and with d29690 ... Von: tue...@freebsd.org Gesendet: Sunday, April 11, 2021 2:30:09 PM An: Rick Macklem Cc: Scheffenegger, Richard ; Youssef GHORBAL ; freebsd-net@freebsd.org Betreff: Re: NFS Mount Hangs NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. > On 10. Apr 2021, at 23:59, Rick Macklem wrote: > > tue...@freebsd.org wrote: >> Rick wrote: > [stuff snipped] With r367492 you don't get the upcall with the same error state? Or you don't get an error on a write() call, when there should be one? >> If Send-Q is 0 when the network is partitioned, after healing, the krpc sees >> no activity on >> the socket (until it acquires/processes an RPC it will not do a sosend()). >> Without the 6minute timeout, the RST battle goes on "forever" (I've never >> actually >> waited more than 30minutes, which is close enough to "forever" for me). >> --> With the 6minute timeout, the "battle" stops after 6minutes, when the >> timeout >> causes a soshutdown(..SHUT_WR) on the socket. >> (Since the soshutdown() patch is not yet in "main". I got comments, but >> no "reviewed" >> on it, the 6minute timer won't help if enabled in main. The soclose() >> won't happen >> for TCP connections with the back channel enabled, such as Linux >> 4.1/4.2 ones.) >> I'm confused. So you are saying that if the Send-Q is empty when you >> partition the >> network, and the peer starts to send SYNs after the healing, FreeBSD responds >> with a challenge ACK which triggers the sending of a RST by Linux. This RST >> is >> ignored multiple times. >> Is that true? Even with my patch for the the bug I introduced? > Yes and yes. > Go take another look at linuxtofreenfs.pcap > ("fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap"; if you don't > already have it.) > Look at packet #1949->2069. I use wireshark, but you'll have your favourite. > You'll see the "RST battle" that ends after > 6minutes at packet#2069. If there is no 6minute timeout enabled in the > server side krpc, then the battle just continues (I once let it run for about > 30minutes before giving up). The 6minute timeout is not currently enabled > in main, etc. Hmm. I don't understand why r367492 can impact the processing of the RST, which basically destroys the TCP connection. Richard: Can you explain that? Best regards Michael > >> What version of the kernel are you using? > "main" dated Dec. 23, 2020 + your bugfix + assorted NFS patches that > are not relevant + 2 small krpc related patches. > --> The two small krpc related patches enable the 6minute timeout and > add a soshutdown(..SHUT_WR) call when the 6minute timeout is > triggered. These have no effect until the 6minutes is up and, without > them the "RTS battle" goes on forever. > > Add to the above a revert of r367492 and the RST battle goes away and things > behave as expected. The recovery happens quickly after the network is > unpartitioned, with either 0 or 1 RSTs. > > rick > ps: Once the irrelevant NFS patches make it into "main", I will upgrade to > main bits-de-jur for testing. > > Best regards > Michael >> >> If Send-Q is non-empty when the network is partitioned, the battle will not >> happen. >> >>> >>> My understanding is that he needs this error indication when calling >>> shutdown(). >> There are several ways the krpc notices that a TCP connection is no longer >> functional. >> - An error return like EPIPE from either sosend() or soreceive(). >> - A return of 0 from soreceive() with no data (normal EOF from other end). >> - A 6minute timeout on the serve
[Bug 220468] libfetch: Does not handle 407 (proxy auth) when connecting to HTTPS using connect tunnel
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220468 --- Comment #18 from Kubilay Kocak --- (In reply to Renato Botelho from comment #17) ^Triage: Thanks, please include PR: references in those merges :) -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"