On Wed, Nov 17, 2021 at 03:25:36PM +0000, Mikolaj Kucharski wrote:
> On Mon, Nov 15, 2021 at 12:33:09PM +0000, Mikolaj Kucharski wrote:
> > I have more panics, with couple of minutes gap between
> > ieee80211_setkeysdone() from ieee80211_proto.c and panic
> > in ieee80211_encrypt() from ieee80211_crypto.c.
> >
> > Sorry for wrapped lines, but I added Time::HiRes in a rush,
> > the console grabbing script.
> >
>
> Improved console grabbing script, and also added address where the key
> is stored via printf()'s %p. This is full boot up with more debugging.
> I've added printf()'s all but one for memset()'s in net80211/. Not sure
> is full diff interesting here.
>
> What is strange, I would imagine, that address of they key is one
> between those two:
I still cannot make conclusive sense of this.
Your debug log lacks some information:
The node state everywhere a ieee80211_node *ni is involved.
Add this to printf calls to show this: "%d", ni->ni_state
The MAC address everywhere a ieee80211_node *ni is involved.
Add this to printf calls to show the MAC: "%s", ether_sprintf(ni->ni_macaddr)
(Note that you can only use one ether_sprintf() call per printf() call
because ether_sprintf() uses a static buffer.)
Everywhere you see a 'wh', please print whether
IEEE80211_IS_MULTICAST(wh->i_addr1) is true or false.
> 20211117.123448.498681Z: MMM: ieee80211_clear_htcaps()
> [ieee80211_node.c|2328] memset() v25
This probably implies a node has left the network.
But it is unclear which node, since we don't see its MAC address.
This might be related to the crash, or it might not.
If this is happening in the call path I believe it is, then we are clearing
ni_pairwise_key at this point, which could explain why you see a key with all
zeros in your debug log.
However, this does not really make sense for another reason, because the
crash happens in a code path which should only be processing multicast
frames, not unicast (more below).
> 20211117.123448.503936Z: MMM: ar5008_tx() [ar5008.c|1527]
> 20211117.123448.506663Z: MMM: ar5008_tx() [ar5008.c|1530]: key unset for sw
> crypto: 0
> 20211117.123448.513071Z: MMM: ieee80211_encrypt() [ieee80211_crypto.c|262]:
> k: 0xffff8000225a6708
> 20211117.123448.518301Z: MMM: ieee80211_encrypt() [ieee80211_crypto.c|263]:
> k_id: 0x0
> 20211117.123448.523580Z: MMM: ieee80211_encrypt() [ieee80211_crypto.c|264]:
> k_flags: 0x0
> 20211117.123448.530034Z: MMM: ieee80211_encrypt() [ieee80211_crypto.c|265]:
> k_len: 0x0
> 20211117.123448.535279Z: MMM: ieee80211_encrypt() [ieee80211_crypto.c|266]:
> k_cipher: 0x0
> 20211117.123448.540526Z: MMM: ieee80211_encrypt() [ieee80211_crypto.c|267]:
> k_key: 0x0000000000000000000000000000000000000000000000000000000000000000
> 20211117.123448.552314Z: panic: ieee80211_encrypt: key unset for sw crypto:
> id=0 cipher=0 flags=0x0
> 20211117.123448.561466Z: Stopped at db_enter+0x10: popq %rbp
>
> What I did not expect is k=0xffff80000009cfc8 and k=0xffff80000009cf00
> between kid=2 and kid=1 and during panic k=0xffff8000225a6708. I don't
> know from where 0xffff8000225a6708 came from.
This probably means the key being used is not the group key. Does the
unexpected address correspond to the memory address of any ni->ni_parwise_key?
And what is the MAC address of this node?
And what is ni->ni_state of this node?
Now taking look at the back trace:
> 20211117.123448.728657Z: *cpu0: ieee80211_encrypt: key unset for sw crypto:
> id=0 cipher=0 flags=0x0
> 20211117.123448.735068Z: ddb{0}>
> 20211117.123448.735652Z:
> 20211117.123448.735864Z: trace
> 20211117.123448.742778Z: db_enter() at db_enter+0x10
> 20211117.123448.747944Z: panic(ffffffff81e99dc5) at panic+0xbf
> 20211117.123448.753069Z:
> ieee80211_encrypt(ffff80000009c048,fffffd80ca4f4d00,ffff800000d2d1c0) at
> ieee80211_encrypt+0x21d
> 20211117.123448.763317Z:
> ar5008_tx(ffff80000009c000,fffffd80ca4f4d00,ffff800000d2d000,0) at
> ar5008_tx+0x1e7
> 20211117.123448.769802Z: athn_start(ffff80000009c048) at athn_start+0x108
> 20211117.123448.777576Z: ar5008_intr(ffff80000009c000) at ar5008_intr+0x216
Here we are in the interrupt handler of athn(4).
The queue of power-saved frames being processed here should only contain
multicast cast frames, not unicast.
If these are multicast frames then a group key should be used, and you
would see one of the expected key two addresses.
If that is not true then we have a bug where a unicast frame is enqueued on
ic->ic_bss->ni_saveqd (the AP) instead of an ni->ni_savedq which represents
an associated station.
I don't see where and how this could happen, but this seems to be where
this bug is hiding. Multicast frames are also never encrypted, so they
would never even trigger any attempt to use a key. Only unicast frames
could trigger encryption, but they should not appear when the interrupt
handler sends frames buffered in ic->ic_bss->ni_savedq because this
queue is supposed to contain multicast frames only!!!
Another possibility is that the mbuf on the ni_savedq contains garbage.
But that seems unlikely.