On Thu, 6 Mar 2025, Adrian Chadd wrote:

On Thu, 6 Mar 2025 at 01:44, Bjoern A. Zeeb <b...@freebsd.org> wrote:

On Thu, 6 Mar 2025, Baptiste Daroussin wrote:

On Wed 05 Mar 23:39, Bjoern A. Zeeb wrote:
Hi,

the fix from
https://cgit.FreeBSD.org/src/commit/?id=27bf5c405bf2eb69392e45c06605defc78882612
makes wireless panic for me in my development branch;  that said I am
working on TKIP currently so I have code in progress.

In case you hit the panic (on boot) because you have updated, please
disable the HW_CRYPTO tunable from loader and live without HT/VHT for a
few hours again.

I am looking where the problem comes from now and for a fix then.

/bz


Thanks for the heads up!

The weekly snapshots for pkgbase are generated the weekend, if you
haven't been
able to figure out a fix by then, if would be nice for pkgbase consumers
if you
could revert the change before the next snapshot building.

Many people took the habbit to run pkgbase from weekly snapshots and
reboot
every monday on a fresh new current machine.

I have a very crude workaround I am testing.  I still hope to be able to
fix it more properly if net80211 locking allows us.

The problem isn't so much of reverting.
11db70b6057e41b259dc2245cd893d5b19179fcc
taked about the possible panic on key delete I could not reproduce
anymore.  As I had guessed the mentioned commit was the reason.  The
problem was that that commit had logic inverted and we corrected that
yesterday so now net80211 and ath AP mode and others are fine again but
the panic on key delete is back depending on timing.

The problem is that net80211 already has incosistent locking there and
we have to conditionally unlock for the downcall to driver/firmware
which can sleep.
Both together opens a possible race which is there in theory with or
without LinuxKPI as it is two threads (wpa and net80211) racing.
I am just able to trigger it again (note that some drivers simply omitted
these bits; that's why you don't see ic_cryptocaps set in iwm or iwn)
but that no longer works as firmware needs to be able to handle crypto
for doing A-MPDU offloading.


Ah crap, can you email me the panic and stack traces?

It seems fine on rtwn which does HW crypto, but its also very possible that
rtwn
already has the workarounds.

Or multiple:
- unlocked register writes
- rtwn_cmd_sleepable() which defers it to a task which doesn't fly but
  simply opens more problems, especially for the "SET" operation;
  otherwise you'll have to hold all packets until you get a callback
  (I presume) [and make sure crypto state doesn't change anymore].
  Even more fun, you have to make sure a del key is done before doing
  another set; which either will have to error or sleep again if the
  del is sitll pending so simply "reduces chances", or use the taskq
  for serialization as well (going back to holding packets while any
  crypto operation is in progress).

Seems either way it's not a clean win.

Given LinuxKPI drivers return flags (still an open TODO in my TKIP land,
silly TKIPMIC 'cipher') on setting the key we definitively need to wait
before any further packets or touching the crypto state.

Also you do not normally have sleepable locks under you in native
drivers like LinuxKPI drivers do.
mwl seems to work that way.

I hadn't looked what wpi, ath, rsu, rum, run, and mtw but this made me:
Okay rsu uses a task.
rum uses rum_cmd_sleepable() (assume also task)
run uses ieee80211_runtask
mtw uses ieee80211_runtask
wpi seems to use native locking and deal with the async somehow?
ath seems to use native locking around the keycache sync (you probably
know how that simply just works)?

So the problem has long been known and everyone ported around it rather
than fixing it (that they may need to sleep);  and no one noticed the
theoricaly possible race in net80211 working around the issue (or
ignored it).

That said my solution is also to simply swap locks for now and use
iv_key_update_begin/end as a start, as a proper solution and testing
STA+AP + various drivers will take days and given bapt asked for
something working before the weekend I decided to go that way for now.
Still better than detecting the second thread and spinning there...

/bz

--
Bjoern A. Zeeb                                                     r15:7

Reply via email to