[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-10-29 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 Mark Johnston changed: What|Removed |Added Resolution|--- |FIXED Status|Open

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-10-15 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #26 from shail...@google.com --- (In reply to Mark Johnston from comment #25) It does, I found this bug only on top of the two prior changes: https://reviews.freebsd.org/D46690 https://reviews.freebsd.org/D46691 and I figured

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-10-15 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #25 from Mark Johnston --- (In reply to shailend from comment #24) Does that diff depend on the other two gve diffs which had been posted previously? That is, in what order should they be reviewed? -- You are receiving this m

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-10-15 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #24 from shail...@google.com --- Fixed in https://reviews.freebsd.org/D47138 Thanks a lot for all the help @markj, @kib, and @gallatin! -- You are receiving this mail because: You are the assignee for the bug.

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-10-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #23 from shail...@google.com --- (In reply to Mark Johnston from comment #22) Yup gve_xmit_br enqueueing itself is the problem. Since the cleanup task gve_tx_cleanup_tq already runs off of interrupts, I am thinking of fixing thi

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-10-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #22 from Mark Johnston --- To be clear, the problem is that gve_xmit_br() requeues itself when gve_xmit() is full (i.e., returns ENOBUFS)? Shouldn't it be queuing a cleanup task? -- You are receiving this mail because: You ar

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-10-02 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #21 from Konstantin Belousov --- (In reply to shailend from comment #20) Then, this is especially looks like a live-lock. User thread should not have the priority 4, it is in the range of priorities of the interrupt threads. S

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-10-02 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #20 from shail...@google.com --- (In reply to Konstantin Belousov from comment #19) Thanks for the explanation. The iperf thread owning the lock and the driver thread looping on the cpu both have priority 4. The driver thread wa

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-10-02 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #19 from Konstantin Belousov --- (In reply to shailend from comment #18) Locks (except spinlocks) do not have any magic properties WRT disabling scheduling. So it is absolutely fine for a thread owning a lock to be put off CPU

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-10-02 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #18 from shail...@google.com --- (In reply to Konstantin Belousov from comment #14) Although I do not have access to the VMs to do `show pcpu`, I checked my notes to find this `ps` entry: 100438 Run CPU 11

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-27 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #17 from Konstantin Belousov --- (In reply to Mark Johnston from comment #16) I doubt that system would stay silent about a CPU with disabled interrupts, our IPI code does not tolerate such condition. In fact, I asked about pcp

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-27 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #16 from Mark Johnston --- This smells a bit like a thread disabled interrupts and then went off-CPU somehow. The iperf thread is stuck in the runqueue of a CPU and nothing gets scheduled there, so it doesn't run. If this is n

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-26 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #15 from shail...@google.com --- (In reply to Konstantin Belousov from comment #14) Unfortunately I have lost access to the VMs in this repro and will need to make a fresh repro. I'll post the "show pcpu" for the new repro, hopef

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-26 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #14 from Konstantin Belousov --- (In reply to shailend from comment #13) What does 'show pcpu 11' show? -- You are receiving this mail because: You are the assignee for the bug.

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-26 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #13 from shail...@google.com --- (In reply to Andrew Gallatin from comment #12) Hmmm interesting. In this case though, I'm sure nothing is traversing the networking stack, and no cpu is being consumed. The offending thread seems

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-26 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #12 from Andrew Gallatin --- Comment on attachment 253834 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=253834 procstat_kka Are we absolutely certain that this is a deadlock and not a livelock? If you look at netwo

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-26 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #11 from shail...@google.com --- Just a more succinct view of iperf thread 100719's central role in this deadlock: ``` db> show lockchain 100413 thread 100413 (pid 0, gve0 rxq 0) is blocked on lock 0xfe00df57a3d0 (sleep mute

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-26 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #10 from shail...@google.com --- Superficially it looks like that the iperf thread 100719 was interrupted by an ipi while it held the uma zone lock. It is the only iperf thread in the "run" state, the rest are in "stop". -- You

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-26 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #9 from shail...@google.com --- (In reply to Mark Johnston from comment #7) Also the trace for the uma zone lock holding iperf thread: ``` db> trace 100719 Tracing pid 857 tid 100719 td 0xf800b87ca000 sched_switch() at sche

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-26 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #8 from shail...@google.com --- Created attachment 253834 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=253834&action=edit procstat_kka This is the output of procstat -kka, after the onset of a deadlock, with a singl

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-26 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #7 from Mark Johnston --- (In reply to shailend from comment #6) The memory utilization is low, so this is not a low memory deadlock. We have an iperf thread which is holding a UMA zone lock and an inpcb lock, and it looks like

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-25 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #6 from shail...@google.com --- I reproduced this with invariants+witness+ddb, it takes much longer to hit the deadlock due to lowered throughput due to invariants and witness. Backtraces of locked driver threads: ``` [root@Fr

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-20 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 Andrew Gallatin changed: What|Removed |Added CC||galla...@freebsd.org --- Comment

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-20 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 Mark Johnston changed: What|Removed |Added CC||ma...@freebsd.org Stat

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #3 from Konstantin Belousov --- If you have WITNESS configured, then you can get an overview of the locks ownership on the system, using the 'show alllocks' ddb command. This should allow you to see lock owners, including the s

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 --- Comment #2 from shail...@google.com --- Actually I did run it with INVARIANTS and WITNESS and other options listed on https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks and the deadlock reproduces wi

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 Konstantin Belousov changed: What|Removed |Added CC||k...@freebsd.org --- Comment

[Bug 281560] gve (4) uma deadlock during high tcp throughput

2024-09-17 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560 Mark Linimon changed: What|Removed |Added Keywords||vendor Assignee|b...@free