Module Name:    src
Committed By:   martin
Date:           Wed Oct  9 11:15:39 UTC 2024

Modified Files:
        src/sys/net [netbsd-10]: if_wg.c
        src/tests/net/if_wg [netbsd-10]: t_basic.sh t_misc.sh

Log Message:
Pull up following revision(s) (requested by riastradh in ticket #934):

        sys/net/if_wg.c: revision 1.117
        sys/net/if_wg.c: revision 1.118
        sys/net/if_wg.c: revision 1.119
        sys/net/if_wg.c: revision 1.80
        sys/net/if_wg.c: revision 1.81
        tests/net/if_wg/t_misc.sh: revision 1.13
        sys/net/if_wg.c: revision 1.82
        sys/net/if_wg.c: revision 1.130
        tests/net/if_wg/t_misc.sh: revision 1.14
        sys/net/if_wg.c: revision 1.83
        sys/net/if_wg.c: revision 1.131
        tests/net/if_wg/t_misc.sh: revision 1.15
        sys/net/if_wg.c: revision 1.84
        sys/net/if_wg.c: revision 1.132
        tests/net/if_wg/t_misc.sh: revision 1.16
        sys/net/if_wg.c: revision 1.85
        sys/net/if_wg.c: revision 1.86
        tests/net/if_wg/t_basic.sh: revision 1.5
        sys/net/if_wg.c: revision 1.87
        tests/net/if_wg/t_basic.sh: revision 1.6
        sys/net/if_wg.c: revision 1.88
        sys/net/if_wg.c: revision 1.89
        sys/net/if_wg.c: revision 1.100
        sys/net/if_wg.c: revision 1.101
        sys/net/if_wg.c: revision 1.102
        sys/net/if_wg.c: revision 1.103
        sys/net/if_wg.c: revision 1.104
        sys/net/if_wg.c: revision 1.105
        sys/net/if_wg.c: revision 1.106
        sys/net/if_wg.c: revision 1.107
        sys/net/if_wg.c: revision 1.108
        sys/net/if_wg.c: revision 1.109
        sys/net/if_wg.c: revision 1.120
        sys/net/if_wg.c: revision 1.121
        sys/net/if_wg.c: revision 1.122
        sys/net/if_wg.c: revision 1.123
        sys/net/if_wg.c: revision 1.124
        sys/net/if_wg.c: revision 1.75
        sys/net/if_wg.c: revision 1.77
        sys/net/if_wg.c: revision 1.125
        sys/net/if_wg.c: revision 1.126
        sys/net/if_wg.c: revision 1.79
        sys/net/if_wg.c: revision 1.127
        sys/net/if_wg.c: revision 1.128
        sys/net/if_wg.c: revision 1.129
        sys/net/if_wg.c: revision 1.90
        sys/net/if_wg.c: revision 1.91
        sys/net/if_wg.c: revision 1.92
        sys/net/if_wg.c: revision 1.93
        sys/net/if_wg.c: revision 1.94
        sys/net/if_wg.c: revision 1.95
        sys/net/if_wg.c: revision 1.96
        sys/net/if_wg.c: revision 1.97
        sys/net/if_wg.c: revision 1.98
        sys/net/if_wg.c: revision 1.99
        sys/net/if_wg.c: revision 1.110
        sys/net/if_wg.c: revision 1.111
        sys/net/if_wg.c: revision 1.112
        sys/net/if_wg.c: revision 1.113
        sys/net/if_wg.c: revision 1.114
        sys/net/if_wg.c: revision 1.115
        sys/net/if_wg.c: revision 1.116

fix simple mis-matched function prototype and definitions.
most of these are like, eg
   void foo(int[2]);
with either of these
   void foo(int*) { ... }
   void foo(int[]) { ... }
in some cases (such as stat or utimes* calls found in our header files),
we now match standard definition from opengroup.
found by GCC 12.

sys: Drop redundant NULL check before m_freem(9)
m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c
Compile-tested on amd64/ALL.
Suggested by knakahara@

Add a wg_debug variable to split between debug/trace/dump messages

Add more debugging in packet validation

If any of the WG_DEBUG_XXX symbols happens to be defined (say, from a
stray rump Makefile...) then we now must have WG_DEBUG also defined, so
if it wasn't, make it so.

While the previous change fixed the broken build, it wasn't the best
way, as defining any of the WG_DEBUG_XXX symbols then effectively
defined all of them - making them as seperate entities, pointless.

So, rearrange the way things are done a little to avoid doing that.

Add packet dump debugging
fix size limit calculation in dump and NULL checks
use hexdump...

Fix 32 bit (32 bit size_t) WG_DEBUG builds - use %zu rather than %lu
to print size_t values.

There's a new WG_DEBUG_XXX ( XXX==PACKET ) to deal with now.  That needs
WG_DEBUG defined as well, if set.

Make the debug (WG_DEBUG) func gethexdump() always return a valid
pointer, never NULL, so it doesn't need to be tested before being
printed, which was being done sometimes, but not always.

Add more debugging from Taylor

wg(4): Allow modunload before any interface creation.

The workqueue and pktq are both lazily created, for annoying module
initialization order reasons, so they may not have been created by
the time of modunload.
PR kern/58470

Limit the size of the packet, and print ... if it is bigger. (from kre@)
wg(4): Rework some details of internal session state machine.

This way:
- There is a clear transition between when a session is being set up,
  and when it is exposed to the data rx path (wg_handle_msg_data):
  atomic_store_release to set wgs->wgs_state to INIT_PASSIVE or
  ESTABLISHED.
  (The transition INIT_PASSIVE -> ESTABLISHED is immaterial to the
  data rx path, so that's just atomic_store_relaxed.  Similarly the
  transition to DESTROYING.)
- There is a clear transition between when a session is being set up,
  and when it is exposed to the data tx path (wg_output):
  atomic_store_release to set wgp->wgp_session_stable to it.
- Every path that reinitializes a session must go through
  wg_destroy_session via wg_put_index_session first.  This avoids
  races between session reuse and the data rx/tx paths.
- Add a log message at the time of every state transition.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix logic to ensure session initiation is underway.

Previously, wg_task_send_init_message would call
wg_send_handshake_msg_init if either:
(a) the stable session is UNKNOWN, meaning a session has not yet been
    established, either by us or by the peer (but it could be in
    progress); or
(b) the stable session is not UNKNOWN but the unstable session is
    _not_ INIT_ACTIVE, meaning there is an established session and we
    are not currently initiating a new session.

If wg_output (or wgintr) found no established session while there was
already a session being initiated, we may only enter
wg_task_send_init_message after the session is already established,
and trigger spurious reinitiation.

Instead, create a separate flag to indicate whether it is mandatory
to rekey because limits have passed.  Then create a session only if:
(a) the stable session is not ESTABLISHED, or
(b) the mandatory rekey flag is not set,
and clear the mandatory rekey flag.

While here, arrange to do rekey-after-time on tx, not on callout.  If
there's no data to tx, we shouldn't reinitiate a session -- we should
stay quiet on the network.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails

PR kern/56252: wg(4) state machine has race conditions

PR kern/58463: if_wg does not work when idle.

wg(4): Use callout_halt, not callout_stop.
It's possible that callout_stop might work here, but let's simplify
reasoning about it -- the timers in question only take the peer intr
lock, so it's safe to wait for them while holding the peer lock in
the handshake worker thread.

We may have to undo the task bit but that will take a bit more
analysis to determine.
Prompted by (but probably won't fix anything in):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Omit needless pserialize_perform on transition to DESTROYING.

A session can still be used when it is in the DESTROYING state, so
there's no need to wait for users to drain here -- that's the whole
point of a separate DESTROYING state.

It is only the transition from DESTROYING back to UNKNOWN, after the
session has been unpublished so no new users can begin, that requires
waiting for all users to drain, and we already do that in
wg_destroy_session.

Prompted by (but won't fix anything in, because this is just a
performance optimization):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Expand cookie secret to 32 bytes.
This is only relevant for denial of service mitigation, so it's not
that big a deal, and the spec doesn't say anything about the size,
but let's make it the standard key size.

PR kern/58479: experimental wg(4) uses 32-bit cookie secret, not
32-byte cookie secret

wg(4): Mark wgp_pending volatile to reflect its usage.
Prompted by (but won't fix any part of):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix session destruction.
Schedule destruction as soon as the session is created, to ensure key
erasure within 2*reject-after-time seconds.  Previously, we would
schedule destruction of the previous session 1 second after the next
one has been established.  Combined with a failure to update the
state machine on keepalive packets, this led to temporary deadlock
scenarios.

To keep it simple, there's just one callout which runs every
reject-after-time seconds and erases keys in sessions older than
reject-after-time, so if a session is established the moment after it
runs, the keys might not be erased until (2-eps)*reject-after-time
seconds.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Reject rx on sessions older than reject-after-time sec.
Prompted by (but won't fix anything in):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): On rx of valid ciphertext, make sure to update state machine.

Previously, we also required the plaintext to be a plausible-looking
IP packet before updating the state machine.

But keepalive packets are empty -- and if the peer initiated the
session to rekey after last tx but had no more data to tx, it will
send a keepalive to finish session initiation.
If we didn't update the state machine in that case, we would stay in
INIT_PASSIVE state unable to tx on the session, which would make
things hang.

So make sure to always update the state machine once we have accepted
a packet as genuine, even if it's genuine garbage on the inside.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make sure to update endpoint on keepalive packets too.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

tests/net/if_wg/t_misc: Tweak timeouts in wg_handshake_timeout.

Most of the timers in wg(4) have only 1sec resolution, which might be
rounded in either direction, so make sure there's a 2sec buffer on
either side of the event we care about (the point at which wg(4)
decides to stop retrying handshake).

Won't fix any bugs, but might make the tests slightly less flaky.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions

tests/net/if_wg/t_misc: Elaborate in wg_rekey debug messages.

Helpful for following the test log when things go wrong.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
wg(4): Tests should pass now.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Use 32-bit for times handled in rx/tx paths.

The rx and tx paths require unlocked access to wgs_time_established
(to decide whether it's time to rekey) and wgs_time_last_data_sent
(to decide whether we need to reply to incoming data with a keepalive
packet), so do it with atomic_load/store_*.

On 32-bit platforms, we may not be able to do that on time_t.

However, since sessions only last for a few minutes before
reject-after-time kicks in and they are erased, 32 bits is plenty to
record the durations that we need to record here, so this shouldn't
introduce any new bugs even on hosts that exceed 136 years of uptime.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make time_uptime32 work in netbsd<=10.

This is the low 32 bits of time_uptime.
Will simplify pullups to 10 for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix quotation in comment.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Process all altq'd packets when deleting peer.

Can't just drop them because we can only go through all packets on an
interface at a time, for all peers -- so we'd either have to drop all
peers' packets, or requeue the packets for other peers.  Probably not
worth the trouble, so let's just wait for all the packets currently
queued up to go through first.

This requires reordering teardown so that we wg_destroy_all_peers,
and thus wg_purge_pending_packets, _before_ we wg_if_detach, because
wg_if_detach -> if_detach destroys the lock that IFQ_DEQUEUE uses.

PR kern/58477: experimental wg(4) ALTQ support is probably buggy

wg(4): Tidy up error branches.
No functional change intended, except to add some log messages in
failure cases.
Cleanup after:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Be more consistent about #ifdef INET/INET6.
PR kern/58478: experimental wg(4) probably doesn't build with
INET6-only

wg(4): Parenthesize macro expansions properly.

PR kern/58480: experimental wg(4) sliding window logic has oopsie

wg(4): Delete temporary hacks to dump keys and packets.
No longer useful for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Explain why gethexdump/puthexdump is there, and tidy.
This way I will not be tempted to replace it by in-line calls to
libkern hexdump.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Put force_rekey state in the session, not the peer.
That way, there is a time when one thread has exclusive access to the
state, in wg_destroy_session under the peer lock, when we can clear
the state without racing against the data tx path.
This will work more reliably than the atomic_swap_uint I used before.
Noted by kre@.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle static on fixed-size array parameters.

Let's make the static size declarations useful.
No functional change intended.

wg(4): Queue pending packet in FIFO order, not LIFO order.

Sometimes the session takes a seconds to establish, for whatever
reason.  It is better if the pending packet, which we queue up to
send as soon as we get the responder's handshake response, is the
most recent packet, rather than the first packet.

That way, we don't wind up with a weird multi-second-delayed ping,
followed by a bunch of dropped, followed by normal ping timings, or
wind up sending the first TCP SYN instead of the most recent, or what
have you.  Senders need to be prepared to retransmit anyway if
packets are dropped.

PR kern/58508: experimental wg(4) queues LIFO, not FIFO, pending
first handshake
wg(4): Sprinkle comments into wg_swap_sessions.
No functional change intended.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): No need for atomic access to wgs_time_established in tx/rx.

This is stable while the session is visible to the tx/rx paths -- it
is initialized before the session is exposed to tx/rx, and doesn't
change until the session is no longer used by any tx/rx path and has
been recycled.

When I sprinkled atomic access to wgs_time_established in if_wg.c
rev. 1.104, it was a vestige of an uncommitted draft that did the
transition from INIT_PASSIVE to ESTABLISHED in the tx path itself, in
an attempt to enable prompter tx on the new session as soon as it is
established.  This turned out to be unnecessary, so I reverted most
of it, but forgot that wgs_time_established no longer needed atomic
treatment.

We could go back to using time_t and time_uptime, now that there's no
need to do atomic loads and stores on these quantities. But there's
no point in 64-bit arithmetic when the time differences are all
guaranteed bounded by a few minutes, so keeping it 32-bit is probably
a slight performance improvement on 32-bit systems.
(In contrast, wgs_time_last_data_sent is both written and read in the
tx path, which may run in parallel on multiple CPUs, so it still
requires the atomic treatment.)
Tidying up for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix memory ordering in detach.
PR kern/58510: experimental wg(4) lacks memory ordering between
wg_count_dec and module unload

wg(4): Fix typo in comment recently added.
Comment added in the service of:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Omit needless atomic_load.
wgs_local_index is only ever written to while only one thread has
access to it and it is not in the thmap -- before it is published in
wg_get_session_index, and after it is unpublished in
wg_destroy_session.  So no need for atomic_load -- it is stable if we
observe it in thmap_get result.
(Of course this is only for an assertion, which if tripped obviously
indicates a violation of our assumptions.  But if that happens, well,
in the worst case we'll see a weird assertion message claiming that
the index is not equal to itself, which from which we can conclude
there must have been a concurrent update, which is good enough to
help diagnose that problem without any atomic_load.)

Tidying some of the changes for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle comments on internal sliding window API.
Post-fix tidying for:
PR kern/58480: experimental wg(4) sliding window logic has oopsie

wg(4): Deduplicate session establishment actions.
The actions to
(a) record the last handshake time,
(b) clear some handshake state,
(c) transmit first data if queued, or (if initiator) keepalive, and
(d) begin destroying the old session,
were formerly duplicated between wg_handle_msg_resp (for when we're
the initiator) and wg_task_establish_session (for when we're the
responder).

Instead, let's factor this out into wg_swap_session so there's only
one copy of the logic.
This requires moving wg_update_endpoint_if_necessary a little earlier
in wg_handle_msg_resp -- which should be done anyway so that the
endpoint is updated _before_ the session is published for the data tx
path to use.

Other than moving wg_update_endpoint_if_necessary a little earlier,
no functional change intended.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Read wgs_state atomically in wg_get_stable_session.
As noted in the comment above, it may concurrently transition from
ESTABLISHED to DESTROYING.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Force rekey on tx if session is older than reject-after-time.
One more corner case for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Add missing barriers around wgp_pending access.
PR kern/58520: experimental wg(4) lacks barriers around access to
packet pending initiation
wg(4): Trigger session initiation in wgintr, not in wg_output.

We have to look up the session in wgintr anyway, for
wg_send_data_msg.  By triggering session initiation in wgintr instead
of wg_output, we can skip the stable session lookup and reference in
wg_output -- simpler that way.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Queue packet for post-handshake retransmit if limits are hit.
PR kern/58521: experimental wg(4) may drop packet after minutes of quiet
wg(4): When a session is established, send first packet directly.

Like we would do with the keepalive packet, if we had to send that
instead -- no need to defer it to the pktq.  Keep it simple.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle volatile on variables requiring atomic access.
No functional change intended, since the relevant access is always
done with atomic_* when it might race with concurrent access -- and
really this should be _Atomic or something.  But for now our
atomic_ops(9) API is still spelled with volatile, so we'll use that.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make a rule for who wins when both peers send INIT at once.
The rule is that the peer with the numerically smaller public key
hash, in little-endian, takes priority iff the low order bit of
H(peer A pubkey) ^ H(peer B pubkey) ^ H(posix minutes as le64)
is 0, and the peer with the lexicographically larger public key takes
priority iff the low-order bit is 1.

Another case of:
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

This one is, as far as I can tell, simply a deadlock in the protocol
of the whitepaper -- until both sides give up on the handshake and
one of them (but not both) later decides to try sending data again.
(But not related to our t_misc:wg_rekey test, as far as I can tell,
and I haven't put enough thought into how to reliably trigger this
race to write a new automatic test for it.)
wg(4): Add Internet Archive links for the versions cited.
No functional change.

tests/net/if_wg/t_misc: Add some diagnostics.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails

wg(4): Test truncated UDP input from the network.
This triggers double-free in the IPv6 udp6_input path -- but,
confusingly, not the IPv4 udp_input path, even though the overudp_cb
interface ought to be the same:
/* udp_input -- no further use of m if return is -1 */
        if ((n = udp4_realinput(&src, &dst, &m, iphlen)) == -1) {
                UDP_STATINC(UDP_STAT_HDROPS);
                return;
        }
/* udp6_input -- m_freem if return is not 0 */
        if (udp6_realinput(AF_INET6, &src, &dst, &m, off) == 0) {
...
        }
bad:
        m_freem(m);
        return IPPROTO_DONE;

The subroutines udp4_realinput and udp6_realinput pass through the
return value of overudp_cb in essentially the same way:
/* udp4_realinput */
                if (inp->inp_overudp_cb != NULL) {
                        int ret;
                        ret = inp->inp_overudp_cb(mp, off, inp->inp_socket,
                            sintosa(src), inp->inp_overudp_arg);
                        switch (ret) {
                        case -1: /* Error, m was freed */
                                rcvcnt = -1;
                                goto bad;
...
bad:
        return rcvcnt;
/* udp6_realinput */
                if (inp->inp_overudp_cb != NULL) {
                        int ret;
                        ret = inp->inp_overudp_cb(mp, off, inp->inp_socket,
                            sin6tosa(src), inp->inp_overudp_arg);
                        switch (ret) {
                        case -1: /* Error, m was freed */
                                rcvcnt = -1;
                                goto bad;
...
bad:
        return rcvcnt;

PR kern/58688: userland panic of kernel via wg(4)

wg(4): Fix wg_overudp_cb drop paths to null out *mp as caller needs.
PR kern/58688: userland panic of kernel via wg(4)


To generate a diff of this commit:
cvs rdiff -u -r1.71.2.3 -r1.71.2.4 src/sys/net/if_wg.c
cvs rdiff -u -r1.4 -r1.4.6.1 src/tests/net/if_wg/t_basic.sh
cvs rdiff -u -r1.12 -r1.12.2.1 src/tests/net/if_wg/t_misc.sh

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/sys/net/if_wg.c
diff -u src/sys/net/if_wg.c:1.71.2.3 src/sys/net/if_wg.c:1.71.2.4
--- src/sys/net/if_wg.c:1.71.2.3	Mon Mar 11 19:34:00 2024
+++ src/sys/net/if_wg.c	Wed Oct  9 11:15:39 2024
@@ -1,4 +1,4 @@
-/*	$NetBSD: if_wg.c,v 1.71.2.3 2024/03/11 19:34:00 martin Exp $	*/
+/*	$NetBSD: if_wg.c,v 1.71.2.4 2024/10/09 11:15:39 martin Exp $	*/
 
 /*
  * Copyright (C) Ryota Ozaki <ozaki.ry...@gmail.com>
@@ -37,11 +37,13 @@
  * 2018-07-11 [2] is referred with label [N].
  *
  * [1] https://www.wireguard.com/papers/wireguard.pdf
+ *     https://web.archive.org/web/20180805103233/https://www.wireguard.com/papers/wireguard.pdf
  * [2] http://noiseprotocol.org/noise.pdf
+ *     https://web.archive.org/web/20180727193154/https://noiseprotocol.org/noise.pdf
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: if_wg.c,v 1.71.2.3 2024/03/11 19:34:00 martin Exp $");
+__KERNEL_RCSID(0, "$NetBSD: if_wg.c,v 1.71.2.4 2024/10/09 11:15:39 martin Exp $");
 
 #ifdef _KERNEL_OPT
 #include "opt_altq_enabled.h"
@@ -83,6 +85,8 @@ __KERNEL_RCSID(0, "$NetBSD: if_wg.c,v 1.
 #include <sys/timespec.h>
 #include <sys/workqueue.h>
 
+#include <lib/libkern/libkern.h>
+
 #include <net/bpf.h>
 #include <net/if.h>
 #include <net/if_types.h>
@@ -90,6 +94,7 @@ __KERNEL_RCSID(0, "$NetBSD: if_wg.c,v 1.
 #include <net/pktqueue.h>
 #include <net/route.h>
 
+#ifdef INET
 #include <netinet/in.h>
 #include <netinet/in_pcb.h>
 #include <netinet/in_var.h>
@@ -97,6 +102,7 @@ __KERNEL_RCSID(0, "$NetBSD: if_wg.c,v 1.
 #include <netinet/ip_var.h>
 #include <netinet/udp.h>
 #include <netinet/udp_var.h>
+#endif	/* INET */
 
 #ifdef INET6
 #include <netinet/ip6.h>
@@ -104,7 +110,7 @@ __KERNEL_RCSID(0, "$NetBSD: if_wg.c,v 1.
 #include <netinet6/in6_var.h>
 #include <netinet6/ip6_var.h>
 #include <netinet6/udp6_var.h>
-#endif /* INET6 */
+#endif	/* INET6 */
 
 #include <prop/proplib.h>
 
@@ -119,6 +125,10 @@ __KERNEL_RCSID(0, "$NetBSD: if_wg.c,v 1.
 #include "wg_user.h"
 #endif
 
+#ifndef time_uptime32
+#define	time_uptime32	((uint32_t)time_uptime)
+#endif
+
 /*
  * Data structures
  * - struct wg_softc is an instance of wg interfaces
@@ -167,6 +177,8 @@ __KERNEL_RCSID(0, "$NetBSD: if_wg.c,v 1.
 #define WGLOG(level, fmt, args...)					      \
 	log(level, "%s: " fmt, __func__, ##args)
 
+#define WG_DEBUG
+
 /* Debug options */
 #ifdef WG_DEBUG
 /* Output debug logs */
@@ -185,17 +197,36 @@ __KERNEL_RCSID(0, "$NetBSD: if_wg.c,v 1.
 #ifndef WG_DEBUG_PARAMS
 #define WG_DEBUG_PARAMS
 #endif
+#endif /* WG_DEBUG */
+
+#ifndef WG_DEBUG
+# if defined(WG_DEBUG_LOG) || defined(WG_DEBUG_TRACE) ||		    \
+	defined(WG_DEBUG_DUMP) || defined(WG_DEBUG_PARAMS)
+#   define WG_DEBUG
+# endif
+#endif
+
+#ifdef WG_DEBUG
+int wg_debug;
+#define WG_DEBUG_FLAGS_LOG	1
+#define WG_DEBUG_FLAGS_TRACE	2
+#define WG_DEBUG_FLAGS_DUMP	4
 #endif
 
 #ifdef WG_DEBUG_TRACE
-#define WG_TRACE(msg)							      \
-	log(LOG_DEBUG, "%s:%d: %s\n", __func__, __LINE__, (msg))
+#define WG_TRACE(msg)	 do {						\
+	if (wg_debug & WG_DEBUG_FLAGS_TRACE)				\
+	    log(LOG_DEBUG, "%s:%d: %s\n", __func__, __LINE__, (msg));	\
+} while (0)
 #else
 #define WG_TRACE(msg)	__nothing
 #endif
 
 #ifdef WG_DEBUG_LOG
-#define WG_DLOG(fmt, args...)	log(LOG_DEBUG, "%s: " fmt, __func__, ##args)
+#define WG_DLOG(fmt, args...)	 do {					\
+	if (wg_debug & WG_DEBUG_FLAGS_LOG)				\
+	    log(LOG_DEBUG, "%s: " fmt, __func__, ##args);		\
+} while (0)
 #else
 #define WG_DLOG(fmt, args...)	__nothing
 #endif
@@ -213,19 +244,37 @@ static bool wg_force_underload = false;
 
 #ifdef WG_DEBUG_DUMP
 
+static char enomem[10] = "[enomem]";
+
+#define	MAX_HDUMP_LEN	10000	/* large enough */
+
+/*
+ * gethexdump(p, n)
+ *
+ *	Allocate a string returning a hexdump of bytes p[0..n),
+ *	truncated to MAX_HDUMP_LEN.  Must be freed with puthexdump.
+ *
+ *	We use this instead of libkern hexdump() because the result is
+ *	logged with log(LOG_DEBUG, ...), which puts a priority tag on
+ *	every message, so it can't be done incrementally.
+ */
 static char *
-gethexdump(const char *p, size_t n)
+gethexdump(const void *vp, size_t n)
 {
 	char *buf;
-	size_t i;
+	const uint8_t *p = vp;
+	size_t i, alloc;
 
-	if (n > SIZE_MAX/3 - 1)
-		return NULL;
-	buf = kmem_alloc(3*n + 1, KM_NOSLEEP);
+	alloc = n;
+	if (n > MAX_HDUMP_LEN)
+		alloc = MAX_HDUMP_LEN;
+	buf = kmem_alloc(3*alloc + 5, KM_NOSLEEP);
 	if (buf == NULL)
-		return NULL;
-	for (i = 0; i < n; i++)
+		return enomem;
+	for (i = 0; i < alloc; i++)
 		snprintf(buf + 3*i, 3 + 1, " %02hhx", p[i]);
+	if (alloc != n)
+		snprintf(buf + 3*i, 4 + 1, " ...");
 	return buf;
 }
 
@@ -233,18 +282,23 @@ static void
 puthexdump(char *buf, const void *p, size_t n)
 {
 
-	if (buf == NULL)
+	if (buf == NULL || buf == enomem)
 		return;
-	kmem_free(buf, 3*n + 1);
+	if (n > MAX_HDUMP_LEN)
+		n = MAX_HDUMP_LEN;
+	kmem_free(buf, 3*n + 5);
 }
 
 #ifdef WG_RUMPKERNEL
 static void
 wg_dump_buf(const char *func, const char *buf, const size_t size)
 {
+	if ((wg_debug & WG_DEBUG_FLAGS_DUMP) == 0)
+		return;
+
 	char *hex = gethexdump(buf, size);
 
-	log(LOG_DEBUG, "%s: %s\n", func, hex ? hex : "(enomem)");
+	log(LOG_DEBUG, "%s: %s\n", func, hex);
 	puthexdump(hex, buf, size);
 }
 #endif
@@ -253,9 +307,12 @@ static void
 wg_dump_hash(const uint8_t *func, const uint8_t *name, const uint8_t *hash,
     const size_t size)
 {
+	if ((wg_debug & WG_DEBUG_FLAGS_DUMP) == 0)
+		return;
+
 	char *hex = gethexdump(hash, size);
 
-	log(LOG_DEBUG, "%s: %s: %s\n", func, name, hex ? hex : "(enomem)");
+	log(LOG_DEBUG, "%s: %s: %s\n", func, name, hex);
 	puthexdump(hex, hash, size);
 }
 
@@ -271,14 +328,14 @@ wg_dump_hash(const uint8_t *func, const 
 #define WG_DUMP_BUF(buf, size)	__nothing
 #endif /* WG_DEBUG_DUMP */
 
-/* chosen somewhat arbitrarily -- fits in signed 16 bits NUL-termintaed */
+/* chosen somewhat arbitrarily -- fits in signed 16 bits NUL-terminated */
 #define	WG_MAX_PROPLEN		32766
 
 #define WG_MTU			1420
 #define WG_ALLOWEDIPS		16
 
 #define CURVE25519_KEY_LEN	32
-#define TAI64N_LEN		sizeof(uint32_t) * 3
+#define TAI64N_LEN		(sizeof(uint32_t) * 3)
 #define POLY1305_AUTHTAG_LEN	16
 #define HMAC_BLOCK_LEN		64
 
@@ -304,7 +361,7 @@ wg_dump_hash(const uint8_t *func, const 
 
 #define WG_COOKIE_LEN		16
 #define WG_MAC_LEN		16
-#define WG_RANDVAL_LEN		24
+#define WG_COOKIESECRET_LEN	32
 
 #define WG_EPHEMERAL_KEY_LEN	CURVE25519_KEY_LEN
 /* [N] 5.2: "ck: A chaining key of HASHLEN bytes" */
@@ -352,7 +409,7 @@ struct wg_msg_data {
 	uint32_t	wgmd_type;
 	uint32_t	wgmd_receiver;
 	uint64_t	wgmd_counter;
-	uint32_t	wgmd_packet[0];
+	uint32_t	wgmd_packet[];
 } __packed;
 
 /* [W] 5.4.7 Under Load: Cookie Reply Message */
@@ -373,7 +430,7 @@ struct wg_msg_cookie {
 
 #define	SLIWIN_BITS	2048u
 #define	SLIWIN_TYPE	uint32_t
-#define	SLIWIN_BPW	NBBY*sizeof(SLIWIN_TYPE)
+#define	SLIWIN_BPW	(NBBY*sizeof(SLIWIN_TYPE))
 #define	SLIWIN_WORDS	howmany(SLIWIN_BITS, SLIWIN_BPW)
 #define	SLIWIN_NPKT	(SLIWIN_BITS - NBBY*sizeof(SLIWIN_TYPE))
 
@@ -382,6 +439,14 @@ struct sliwin {
 	uint64_t	T;
 };
 
+/*
+ * sliwin_reset(W)
+ *
+ *	Reset sliding window state to a blank history with no observed
+ *	sequence numbers.
+ *
+ *	Caller must have exclusive access to W.
+ */
 static void
 sliwin_reset(struct sliwin *W)
 {
@@ -389,6 +454,16 @@ sliwin_reset(struct sliwin *W)
 	memset(W, 0, sizeof(*W));
 }
 
+/*
+ * sliwin_check_fast(W, S)
+ *
+ *	Do a fast check of the sliding window W to validate sequence
+ *	number S.  No state is recorded.  Return 0 on accept, nonzero
+ *	error code on reject.
+ *
+ *	May be called concurrently with other calls to
+ *	sliwin_check_fast and sliwin_update.
+ */
 static int
 sliwin_check_fast(const volatile struct sliwin *W, uint64_t S)
 {
@@ -410,6 +485,17 @@ sliwin_check_fast(const volatile struct 
 	return 0;
 }
 
+/*
+ * sliwin_update(W, S)
+ *
+ *	Check the sliding window W to validate sequence number S, and
+ *	if accepted, update it to reflect having observed S.  Return 0
+ *	on accept, nonzero error code on reject.
+ *
+ *	May be called concurrently with other calls to
+ *	sliwin_check_fast, but caller must exclude other calls to
+ *	sliwin_update.
+ */
 static int
 sliwin_update(struct sliwin *W, uint64_t S)
 {
@@ -456,15 +542,17 @@ struct wg_session {
 	struct psref_target
 			wgs_psref;
 
-	int		wgs_state;
+	volatile int	wgs_state;
 #define WGS_STATE_UNKNOWN	0
 #define WGS_STATE_INIT_ACTIVE	1
 #define WGS_STATE_INIT_PASSIVE	2
 #define WGS_STATE_ESTABLISHED	3
 #define WGS_STATE_DESTROYING	4
 
-	time_t		wgs_time_established;
-	time_t		wgs_time_last_data_sent;
+	uint32_t	wgs_time_established;
+	volatile uint32_t
+			wgs_time_last_data_sent;
+	volatile bool	wgs_force_rekey;
 	bool		wgs_is_initiator;
 
 	uint32_t	wgs_local_index;
@@ -546,26 +634,25 @@ struct wg_peer {
 	kmutex_t		*wgp_intr_lock;
 
 	uint8_t	wgp_pubkey[WG_STATIC_KEY_LEN];
-	struct wg_sockaddr	*wgp_endpoint;
+	struct wg_sockaddr	*volatile wgp_endpoint;
 	struct wg_sockaddr	*wgp_endpoint0;
 	volatile unsigned	wgp_endpoint_changing;
-	bool			wgp_endpoint_available;
+	volatile bool		wgp_endpoint_available;
 
 			/* The preshared key (optional) */
 	uint8_t		wgp_psk[WG_PRESHARED_KEY_LEN];
 
-	struct wg_session	*wgp_session_stable;
+	struct wg_session	*volatile wgp_session_stable;
 	struct wg_session	*wgp_session_unstable;
 
 	/* first outgoing packet awaiting session initiation */
-	struct mbuf		*wgp_pending;
+	struct mbuf		*volatile wgp_pending;
 
 	/* timestamp in big-endian */
 	wg_timestamp_t	wgp_timestamp_latest_init;
 
 	struct timespec		wgp_last_handshake_time;
 
-	callout_t		wgp_rekey_timer;
 	callout_t		wgp_handshake_timeout_timer;
 	callout_t		wgp_session_dtor_timer;
 
@@ -583,8 +670,8 @@ struct wg_peer {
 
 	time_t			wgp_last_msg_received_time[WG_MSG_TYPE_MAX];
 
-	time_t			wgp_last_genrandval_time;
-	uint32_t		wgp_randval;
+	time_t			wgp_last_cookiesecret_time;
+	uint8_t			wgp_cookiesecret[WG_COOKIESECRET_LEN];
 
 	struct wg_ppsratecheck	wgp_ppsratecheck;
 
@@ -652,7 +739,7 @@ struct wg_softc {
 #define WG_KEEPALIVE_TIMEOUT		 10
 
 #define WG_COOKIE_TIME			120
-#define WG_RANDVAL_TIME			(2 * 60)
+#define WG_COOKIESECRET_TIME		(2 * 60)
 
 static uint64_t wg_rekey_after_messages = WG_REKEY_AFTER_MESSAGES;
 static uint64_t wg_reject_after_messages = WG_REJECT_AFTER_MESSAGES;
@@ -665,11 +752,12 @@ static unsigned wg_keepalive_timeout = W
 static struct mbuf *
 		wg_get_mbuf(size_t, size_t);
 
-static int	wg_send_data_msg(struct wg_peer *, struct wg_session *,
+static void	wg_send_data_msg(struct wg_peer *, struct wg_session *,
 		    struct mbuf *);
-static int	wg_send_cookie_msg(struct wg_softc *, struct wg_peer *,
-		    const uint32_t, const uint8_t [], const struct sockaddr *);
-static int	wg_send_handshake_msg_resp(struct wg_softc *, struct wg_peer *,
+static void	wg_send_cookie_msg(struct wg_softc *, struct wg_peer *,
+		    const uint32_t, const uint8_t[static WG_MAC_LEN],
+		    const struct sockaddr *);
+static void	wg_send_handshake_msg_resp(struct wg_softc *, struct wg_peer *,
 		    struct wg_session *, const struct wg_msg_init *);
 static void	wg_send_keepalive_msg(struct wg_peer *, struct wg_session *);
 
@@ -678,7 +766,7 @@ static struct wg_peer *
 		    struct psref *);
 static struct wg_peer *
 		wg_lookup_peer_by_pubkey(struct wg_softc *,
-		    const uint8_t [], struct psref *);
+		    const uint8_t[static WG_STATIC_KEY_LEN], struct psref *);
 
 static struct wg_session *
 		wg_lookup_session_by_index(struct wg_softc *,
@@ -687,7 +775,6 @@ static struct wg_session *
 static void	wg_update_endpoint_if_necessary(struct wg_peer *,
 		    const struct sockaddr *);
 
-static void	wg_schedule_rekey_timer(struct wg_peer *);
 static void	wg_schedule_session_dtor_timer(struct wg_peer *);
 
 static bool	wg_is_underload(struct wg_softc *, struct wg_peer *, int);
@@ -770,8 +857,10 @@ wg_rnh(struct wg_softc *wg, const int fa
 {
 
 	switch (family) {
+#ifdef INET
 		case AF_INET:
 			return wg->wg_rtable_ipv4;
+#endif
 #ifdef INET6
 		case AF_INET6:
 			return wg->wg_rtable_ipv6;
@@ -830,7 +919,7 @@ wginitqueues(void)
 
 	error = workqueue_create(&wg_wq, "wgpeer", wg_peer_work, NULL,
 	    PRI_NONE, IPL_SOFTNET, WQ_MPSAFE|WQ_PERCPU);
-	KASSERT(error == 0);
+	KASSERTMSG(error == 0, "error=%d", error);
 
 	return 0;
 }
@@ -842,7 +931,7 @@ wg_guarantee_initialized(void)
 	int error __diagused;
 
 	error = RUN_ONCE(&init, wginitqueues);
-	KASSERT(error == 0);
+	KASSERTMSG(error == 0, "error=%d", error);
 }
 
 static int
@@ -865,6 +954,7 @@ wg_count_dec(void)
 {
 	unsigned c __diagused;
 
+	membar_release();	/* match atomic_load_acquire in wgdetach */
 	c = atomic_dec_uint_nv(&wg_count);
 	KASSERT(c != UINT_MAX);
 }
@@ -876,24 +966,29 @@ wgdetach(void)
 	/* Prevent new interface creation.  */
 	if_clone_detach(&wg_cloner);
 
-	/* Check whether there are any existing interfaces.  */
-	if (atomic_load_relaxed(&wg_count)) {
+	/*
+	 * Check whether there are any existing interfaces.  Matches
+	 * membar_release and atomic_dec_uint_nv in wg_count_dec.
+	 */
+	if (atomic_load_acquire(&wg_count)) {
 		/* Back out -- reattach the cloner.  */
 		if_clone_attach(&wg_cloner);
 		return EBUSY;
 	}
 
 	/* No interfaces left.  Nuke it.  */
-	workqueue_destroy(wg_wq);
-	pktq_destroy(wg_pktq);
+	if (wg_wq)
+		workqueue_destroy(wg_wq);
+	if (wg_pktq)
+		pktq_destroy(wg_pktq);
 	psref_class_destroy(wg_psref_class);
 
 	return 0;
 }
 
 static void
-wg_init_key_and_hash(uint8_t ckey[WG_CHAINING_KEY_LEN],
-    uint8_t hash[WG_HASH_LEN])
+wg_init_key_and_hash(uint8_t ckey[static WG_CHAINING_KEY_LEN],
+    uint8_t hash[static WG_HASH_LEN])
 {
 	/* [W] 5.4: CONSTRUCTION */
 	const char *signature = "Noise_IKpsk2_25519_ChaChaPoly_BLAKE2s";
@@ -917,7 +1012,7 @@ wg_init_key_and_hash(uint8_t ckey[WG_CHA
 }
 
 static void
-wg_algo_hash(uint8_t hash[WG_HASH_LEN], const uint8_t input[],
+wg_algo_hash(uint8_t hash[static WG_HASH_LEN], const uint8_t input[],
     const size_t inputsize)
 {
 	struct blake2s state;
@@ -980,8 +1075,8 @@ wg_algo_mac_cookie(uint8_t out[], const 
 }
 
 static void
-wg_algo_generate_keypair(uint8_t pubkey[WG_EPHEMERAL_KEY_LEN],
-    uint8_t privkey[WG_EPHEMERAL_KEY_LEN])
+wg_algo_generate_keypair(uint8_t pubkey[static WG_EPHEMERAL_KEY_LEN],
+    uint8_t privkey[static WG_EPHEMERAL_KEY_LEN])
 {
 
 	CTASSERT(WG_EPHEMERAL_KEY_LEN == crypto_scalarmult_curve25519_BYTES);
@@ -991,9 +1086,9 @@ wg_algo_generate_keypair(uint8_t pubkey[
 }
 
 static void
-wg_algo_dh(uint8_t out[WG_DH_OUTPUT_LEN],
-    const uint8_t privkey[WG_STATIC_KEY_LEN],
-    const uint8_t pubkey[WG_STATIC_KEY_LEN])
+wg_algo_dh(uint8_t out[static WG_DH_OUTPUT_LEN],
+    const uint8_t privkey[static WG_STATIC_KEY_LEN],
+    const uint8_t pubkey[static WG_STATIC_KEY_LEN])
 {
 
 	CTASSERT(WG_STATIC_KEY_LEN == crypto_scalarmult_curve25519_BYTES);
@@ -1039,8 +1134,10 @@ wg_algo_hmac(uint8_t out[], const size_t
 }
 
 static void
-wg_algo_kdf(uint8_t out1[WG_KDF_OUTPUT_LEN], uint8_t out2[WG_KDF_OUTPUT_LEN],
-    uint8_t out3[WG_KDF_OUTPUT_LEN], const uint8_t ckey[WG_CHAINING_KEY_LEN],
+wg_algo_kdf(uint8_t out1[static WG_KDF_OUTPUT_LEN],
+    uint8_t out2[WG_KDF_OUTPUT_LEN],
+    uint8_t out3[WG_KDF_OUTPUT_LEN],
+    const uint8_t ckey[static WG_CHAINING_KEY_LEN],
     const uint8_t input[], const size_t inputlen)
 {
 	uint8_t tmp1[WG_KDF_OUTPUT_LEN], tmp2[WG_KDF_OUTPUT_LEN + 1];
@@ -1079,10 +1176,10 @@ wg_algo_kdf(uint8_t out1[WG_KDF_OUTPUT_L
 }
 
 static void __noinline
-wg_algo_dh_kdf(uint8_t ckey[WG_CHAINING_KEY_LEN],
+wg_algo_dh_kdf(uint8_t ckey[static WG_CHAINING_KEY_LEN],
     uint8_t cipher_key[WG_CIPHER_KEY_LEN],
-    const uint8_t local_key[WG_STATIC_KEY_LEN],
-    const uint8_t remote_key[WG_STATIC_KEY_LEN])
+    const uint8_t local_key[static WG_STATIC_KEY_LEN],
+    const uint8_t remote_key[static WG_STATIC_KEY_LEN])
 {
 	uint8_t dhout[WG_DH_OUTPUT_LEN];
 
@@ -1096,8 +1193,10 @@ wg_algo_dh_kdf(uint8_t ckey[WG_CHAINING_
 }
 
 static void
-wg_algo_aead_enc(uint8_t out[], size_t expected_outsize, const uint8_t key[],
-    const uint64_t counter, const uint8_t plain[], const size_t plainsize,
+wg_algo_aead_enc(uint8_t out[], size_t expected_outsize,
+    const uint8_t key[static crypto_aead_chacha20poly1305_ietf_KEYBYTES],
+    const uint64_t counter,
+    const uint8_t plain[], const size_t plainsize,
     const uint8_t auth[], size_t authlen)
 {
 	uint8_t nonce[(32 + 64) / 8] = {0};
@@ -1113,9 +1212,11 @@ wg_algo_aead_enc(uint8_t out[], size_t e
 }
 
 static int
-wg_algo_aead_dec(uint8_t out[], size_t expected_outsize, const uint8_t key[],
-    const uint64_t counter, const uint8_t encrypted[],
-    const size_t encryptedsize, const uint8_t auth[], size_t authlen)
+wg_algo_aead_dec(uint8_t out[], size_t expected_outsize,
+    const uint8_t key[static crypto_aead_chacha20poly1305_ietf_KEYBYTES],
+    const uint64_t counter,
+    const uint8_t encrypted[], const size_t encryptedsize,
+    const uint8_t auth[], size_t authlen)
 {
 	uint8_t nonce[(32 + 64) / 8] = {0};
 	long long unsigned int outsize;
@@ -1132,9 +1233,10 @@ wg_algo_aead_dec(uint8_t out[], size_t e
 
 static void
 wg_algo_xaead_enc(uint8_t out[], const size_t expected_outsize,
-    const uint8_t key[], const uint8_t plain[], const size_t plainsize,
+    const uint8_t key[static crypto_aead_xchacha20poly1305_ietf_KEYBYTES],
+    const uint8_t plain[], const size_t plainsize,
     const uint8_t auth[], size_t authlen,
-    const uint8_t nonce[WG_SALT_LEN])
+    const uint8_t nonce[static WG_SALT_LEN])
 {
 	long long unsigned int outsize;
 	int error __diagused;
@@ -1148,9 +1250,10 @@ wg_algo_xaead_enc(uint8_t out[], const s
 
 static int
 wg_algo_xaead_dec(uint8_t out[], const size_t expected_outsize,
-    const uint8_t key[], const uint8_t encrypted[], const size_t encryptedsize,
+    const uint8_t key[static crypto_aead_xchacha20poly1305_ietf_KEYBYTES],
+    const uint8_t encrypted[], const size_t encryptedsize,
     const uint8_t auth[], size_t authlen,
-    const uint8_t nonce[WG_SALT_LEN])
+    const uint8_t nonce[static WG_SALT_LEN])
 {
 	long long unsigned int outsize;
 	int error;
@@ -1196,7 +1299,8 @@ wg_get_stable_session(struct wg_peer *wg
 
 	s = pserialize_read_enter();
 	wgs = atomic_load_consume(&wgp->wgp_session_stable);
-	if (__predict_false(wgs->wgs_state != WGS_STATE_ESTABLISHED))
+	if (__predict_false(atomic_load_relaxed(&wgs->wgs_state) !=
+		WGS_STATE_ESTABLISHED))
 		wgs = NULL;
 	else
 		psref_acquire(psref, &wgs->wgs_psref, wg_psref_class);
@@ -1232,10 +1336,19 @@ wg_destroy_session(struct wg_softc *wg, 
 	pserialize_perform(wgp->wgp_psz);
 	psref_target_destroy(&wgs->wgs_psref, wg_psref_class);
 
-	/* Free memory, zero state, and transition to UNKNOWN.  */
+	/*
+	 * Free memory, zero state, and transition to UNKNOWN.  We have
+	 * exclusive access to the session now, so there is no need for
+	 * an atomic store.
+	 */
 	thmap_gc(wg->wg_sessions_byindex, garbage);
+	WG_DLOG("session[L=%"PRIx32" R=%"PRIx32"] -> WGS_STATE_UNKNOWN\n",
+	    wgs->wgs_local_index, wgs->wgs_remote_index);
+	wgs->wgs_local_index = 0;
+	wgs->wgs_remote_index = 0;
 	wg_clear_states(wgs);
 	wgs->wgs_state = WGS_STATE_UNKNOWN;
+	wgs->wgs_force_rekey = false;
 }
 
 /*
@@ -1256,7 +1369,8 @@ wg_get_session_index(struct wg_softc *wg
 
 	KASSERT(mutex_owned(wgp->wgp_lock));
 	KASSERT(wgs == wgp->wgp_session_unstable);
-	KASSERT(wgs->wgs_state == WGS_STATE_UNKNOWN);
+	KASSERTMSG(wgs->wgs_state == WGS_STATE_UNKNOWN, "state=%d",
+	    wgs->wgs_state);
 
 	do {
 		/* Pick a uniform random index.  */
@@ -1287,7 +1401,6 @@ wg_put_session_index(struct wg_softc *wg
 	struct wg_peer *wgp __diagused = wgs->wgs_peer;
 
 	KASSERT(mutex_owned(wgp->wgp_lock));
-	KASSERT(wgs == wgp->wgp_session_unstable);
 	KASSERT(wgs->wgs_state != WGS_STATE_UNKNOWN);
 	KASSERT(wgs->wgs_state != WGS_STATE_ESTABLISHED);
 
@@ -1329,7 +1442,8 @@ wg_fill_msg_init(struct wg_softc *wg, st
 
 	KASSERT(mutex_owned(wgp->wgp_lock));
 	KASSERT(wgs == wgp->wgp_session_unstable);
-	KASSERT(wgs->wgs_state == WGS_STATE_INIT_ACTIVE);
+	KASSERTMSG(wgs->wgs_state == WGS_STATE_INIT_ACTIVE, "state=%d",
+	    wgs->wgs_state);
 
 	wgmi->wgmi_type = htole32(WG_MSG_TYPE_INIT);
 	wgmi->wgmi_sender = wgs->wgs_local_index;
@@ -1410,6 +1524,70 @@ wg_fill_msg_init(struct wg_softc *wg, st
 	WG_DLOG("%s: sender=%x\n", __func__, wgs->wgs_local_index);
 }
 
+/*
+ * wg_initiator_priority(wg, wgp)
+ *
+ *	Return true if we claim priority over peer wgp as initiator at
+ *	the moment, false if not.  That is, if we and our peer are
+ *	trying to initiate a session, do we ignore the peer's attempt
+ *	and barge ahead with ours, or discard our attempt and accept
+ *	the peer's?
+ *
+ *	We jointly flip a coin by computing
+ *
+ *		H(pubkey A) ^ H(pubkey B) ^ H(posix minutes as le64),
+ *
+ *	and taking the low-order bit.  If our public key hash, as a
+ *	256-bit integer in little-endian, is less than the peer's
+ *	public key hash, also as a 256-bit integer in little-endian, we
+ *	claim priority iff the bit is 0; otherwise we claim priority
+ *	iff the bit is 1.
+ *
+ *	This way, it is essentially arbitrary who claims priority, and
+ *	it may change (by a coin toss) minute to minute, but both
+ *	parties agree at any given moment -- except possibly at the
+ *	boundary of a minute -- who will take priority.
+ *
+ *	This is an extension to the WireGuard protocol -- as far as I
+ *	can tell, the protocol whitepaper has no resolution to this
+ *	deadlock scenario.  According to the author, `the deadlock
+ *	doesn't happen because of some additional state machine logic,
+ *	and on very small chances that it does, it quickly undoes
+ *	itself.', but this additional state machine logic does not
+ *	appear to be anywhere in the whitepaper, and I don't see how it
+ *	can undo itself until both sides have given up and one side is
+ *	quicker to initiate the next time around.
+ *
+ *	XXX It might be prudent to put a prefix in the hash input, so
+ *	we avoid accidentally colliding with any other uses of the same
+ *	hash on the same input.  But it's best if any changes are
+ *	coordinated, so that peers generally agree on what coin is
+ *	being tossed, instead of tossing their own independent coins
+ *	(which will also converge to working but more slowly over more
+ *	handshake retries).
+ */
+static bool
+wg_initiator_priority(struct wg_softc *wg, struct wg_peer *wgp)
+{
+	const uint64_t now = time_second/60, now_le = htole64(now);
+	uint8_t h_min;
+	uint8_t h_local[BLAKE2S_MAX_DIGEST];
+	uint8_t h_peer[BLAKE2S_MAX_DIGEST];
+	int borrow;
+	unsigned i;
+
+	blake2s(&h_min, 1, NULL, 0, &now_le, sizeof(now_le));
+	blake2s(h_local, sizeof(h_local), NULL, 0,
+	    wg->wg_pubkey, sizeof(wg->wg_pubkey));
+	blake2s(h_peer, sizeof(h_peer), NULL, 0,
+	    wgp->wgp_pubkey, sizeof(wgp->wgp_pubkey));
+
+	for (borrow = 0, i = 0; i < BLAKE2S_MAX_DIGEST; i++)
+		borrow = (h_local[i] - h_peer[i] + borrow) >> 8;
+
+	return 1 & (h_local[0] ^ h_peer[0] ^ h_min ^ borrow);
+}
+
 static void __noinline
 wg_handle_msg_init(struct wg_softc *wg, const struct wg_msg_init *wgmi,
     const struct sockaddr *src)
@@ -1511,13 +1689,13 @@ wg_handle_msg_init(struct wg_softc *wg, 
 		uint8_t zero[WG_MAC_LEN] = {0};
 		if (consttime_memequal(wgmi->wgmi_mac2, zero, sizeof(zero))) {
 			WG_TRACE("sending a cookie message: no cookie included");
-			(void)wg_send_cookie_msg(wg, wgp, wgmi->wgmi_sender,
+			wg_send_cookie_msg(wg, wgp, wgmi->wgmi_sender,
 			    wgmi->wgmi_mac1, src);
 			goto out;
 		}
 		if (!wgp->wgp_last_sent_cookie_valid) {
 			WG_TRACE("sending a cookie message: no cookie sent ever");
-			(void)wg_send_cookie_msg(wg, wgp, wgmi->wgmi_sender,
+			wg_send_cookie_msg(wg, wgp, wgmi->wgmi_sender,
 			    wgmi->wgmi_mac1, src);
 			goto out;
 		}
@@ -1572,42 +1750,102 @@ wg_handle_msg_init(struct wg_softc *wg, 
 	wgs = wgp->wgp_session_unstable;
 	switch (wgs->wgs_state) {
 	case WGS_STATE_UNKNOWN:		/* new session initiated by peer */
-		wg_get_session_index(wg, wgs);
 		break;
-	case WGS_STATE_INIT_ACTIVE:	/* we're already initiating, drop */
-		WG_TRACE("Session already initializing, ignoring the message");
-		goto out;
+	case WGS_STATE_INIT_ACTIVE:	/* we're already initiating */
+		if (wg_initiator_priority(wg, wgp)) {
+			WG_TRACE("Session already initializing,"
+			    " ignoring the message");
+			goto out;
+		}
+		WG_TRACE("Yielding session initiation to peer");
+		wg_put_session_index(wg, wgs);
+		KASSERTMSG(wgs->wgs_state == WGS_STATE_UNKNOWN, "state=%d",
+		    wgs->wgs_state);
+		break;
 	case WGS_STATE_INIT_PASSIVE:	/* peer is retrying, start over */
 		WG_TRACE("Session already initializing, destroying old states");
-		wg_clear_states(wgs);
-		/* keep session index */
+		/*
+		 * XXX Avoid this -- just resend our response -- if the
+		 * INIT message is identical to the previous one.
+		 */
+		wg_put_session_index(wg, wgs);
+		KASSERTMSG(wgs->wgs_state == WGS_STATE_UNKNOWN, "state=%d",
+		    wgs->wgs_state);
 		break;
 	case WGS_STATE_ESTABLISHED:	/* can't happen */
 		panic("unstable session can't be established");
-		break;
 	case WGS_STATE_DESTROYING:	/* rekey initiated by peer */
 		WG_TRACE("Session destroying, but force to clear");
-		callout_stop(&wgp->wgp_session_dtor_timer);
-		wg_clear_states(wgs);
-		/* keep session index */
+		wg_put_session_index(wg, wgs);
+		KASSERTMSG(wgs->wgs_state == WGS_STATE_UNKNOWN, "state=%d",
+		    wgs->wgs_state);
 		break;
 	default:
 		panic("invalid session state: %d", wgs->wgs_state);
 	}
-	wgs->wgs_state = WGS_STATE_INIT_PASSIVE;
+
+	/*
+	 * Assign a fresh session index.
+	 */
+	KASSERTMSG(wgs->wgs_state == WGS_STATE_UNKNOWN, "state=%d",
+	    wgs->wgs_state);
+	wg_get_session_index(wg, wgs);
 
 	memcpy(wgs->wgs_handshake_hash, hash, sizeof(hash));
 	memcpy(wgs->wgs_chaining_key, ckey, sizeof(ckey));
 	memcpy(wgs->wgs_ephemeral_key_peer, wgmi->wgmi_ephemeral,
 	    sizeof(wgmi->wgmi_ephemeral));
 
+	/*
+	 * The packet is genuine.  Update the peer's endpoint if the
+	 * source address changed.
+	 *
+	 * XXX How to prevent DoS by replaying genuine packets from the
+	 * wrong source address?
+	 */
 	wg_update_endpoint_if_necessary(wgp, src);
 
-	(void)wg_send_handshake_msg_resp(wg, wgp, wgs, wgmi);
+	/*
+	 * Even though we don't transition from INIT_PASSIVE to
+	 * ESTABLISHED until we receive the first data packet from the
+	 * initiator, we count the time of the INIT message as the time
+	 * of establishment -- this is used to decide when to erase
+	 * keys, and we want to start counting as soon as we have
+	 * generated keys.
+	 */
+	wgs->wgs_time_established = time_uptime32;
+	wg_schedule_session_dtor_timer(wgp);
 
+	/*
+	 * Respond to the initiator with our ephemeral public key.
+	 */
+	wg_send_handshake_msg_resp(wg, wgp, wgs, wgmi);
+
+	WG_DLOG("session[L=%"PRIx32" R=%"PRIx32"]:"
+	    " calculate keys as responder\n",
+	    wgs->wgs_local_index, wgs->wgs_remote_index);
 	wg_calculate_keys(wgs, false);
 	wg_clear_states(wgs);
 
+	/*
+	 * Session is ready to receive data now that we have received
+	 * the peer initiator's ephemeral key pair, generated our
+	 * responder's ephemeral key pair, and derived a session key.
+	 *
+	 * Transition from UNKNOWN to INIT_PASSIVE to publish it to the
+	 * data rx path, wg_handle_msg_data, where the
+	 * atomic_load_acquire matching this atomic_store_release
+	 * happens.
+	 *
+	 * (Session is not, however, ready to send data until the peer
+	 * has acknowledged our response by sending its first data
+	 * packet.  So don't swap the sessions yet.)
+	 */
+	WG_DLOG("session[L=%"PRIx32" R=%"PRIx32"] -> WGS_STATE_INIT_PASSIVE\n",
+	    wgs->wgs_local_index, wgs->wgs_remote_index);
+	atomic_store_release(&wgs->wgs_state, WGS_STATE_INIT_PASSIVE);
+	WG_TRACE("WGS_STATE_INIT_PASSIVE");
+
 out:
 	mutex_exit(wgp->wgp_lock);
 	wg_put_peer(wgp, &psref_peer);
@@ -1675,7 +1913,7 @@ wg_send_so(struct wg_peer *wgp, struct m
 	return error;
 }
 
-static int
+static void
 wg_send_handshake_msg_init(struct wg_softc *wg, struct wg_peer *wgp)
 {
 	int error;
@@ -1689,25 +1927,41 @@ wg_send_handshake_msg_init(struct wg_sof
 	/* XXX pull dispatch out into wg_task_send_init_message */
 	switch (wgs->wgs_state) {
 	case WGS_STATE_UNKNOWN:		/* new session initiated by us */
-		wg_get_session_index(wg, wgs);
 		break;
 	case WGS_STATE_INIT_ACTIVE:	/* we're already initiating, stop */
 		WG_TRACE("Session already initializing, skip starting new one");
-		return EBUSY;
+		return;
 	case WGS_STATE_INIT_PASSIVE:	/* peer was trying -- XXX what now? */
-		WG_TRACE("Session already initializing, destroying old states");
-		wg_clear_states(wgs);
-		/* keep session index */
-		break;
+		WG_TRACE("Session already initializing, waiting for peer");
+		return;
 	case WGS_STATE_ESTABLISHED:	/* can't happen */
 		panic("unstable session can't be established");
-		break;
 	case WGS_STATE_DESTROYING:	/* rekey initiated by us too early */
 		WG_TRACE("Session destroying");
-		/* XXX should wait? */
-		return EBUSY;
+		wg_put_session_index(wg, wgs);
+		KASSERTMSG(wgs->wgs_state == WGS_STATE_UNKNOWN, "state=%d",
+		    wgs->wgs_state);
+		break;
 	}
-	wgs->wgs_state = WGS_STATE_INIT_ACTIVE;
+
+	/*
+	 * Assign a fresh session index.
+	 */
+	KASSERTMSG(wgs->wgs_state == WGS_STATE_UNKNOWN, "state=%d",
+	    wgs->wgs_state);
+	wg_get_session_index(wg, wgs);
+
+	/*
+	 * We have initiated a session.  Transition to INIT_ACTIVE.
+	 * This doesn't publish it for use in the data rx path,
+	 * wg_handle_msg_data, or in the data tx path, wg_output -- we
+	 * have to wait for the peer to respond with their ephemeral
+	 * public key before we can derive a session key for tx/rx.
+	 * Hence only atomic_store_relaxed.
+	 */
+	WG_DLOG("session[L=%"PRIx32" R=(unknown)] -> WGS_STATE_INIT_ACTIVE\n",
+	    wgs->wgs_local_index);
+	atomic_store_relaxed(&wgs->wgs_state, WGS_STATE_INIT_ACTIVE);
 
 	m = m_gethdr(M_WAIT, MT_DATA);
 	if (sizeof(*wgmi) > MHLEN) {
@@ -1718,22 +1972,28 @@ wg_send_handshake_msg_init(struct wg_sof
 	wgmi = mtod(m, struct wg_msg_init *);
 	wg_fill_msg_init(wg, wgp, wgs, wgmi);
 
-	error = wg->wg_ops->send_hs_msg(wgp, m);
-	if (error == 0) {
-		WG_TRACE("init msg sent");
-
-		if (wgp->wgp_handshake_start_time == 0)
-			wgp->wgp_handshake_start_time = time_uptime;
-		callout_schedule(&wgp->wgp_handshake_timeout_timer,
-		    MIN(wg_rekey_timeout, (unsigned)(INT_MAX / hz)) * hz);
-	} else {
+	error = wg->wg_ops->send_hs_msg(wgp, m); /* consumes m */
+	if (error) {
+		/*
+		 * Sending out an initiation packet failed; give up on
+		 * this session and toss packet waiting for it if any.
+		 *
+		 * XXX Why don't we just let the periodic handshake
+		 * retry logic work in this case?
+		 */
+		WG_DLOG("send_hs_msg failed, error=%d\n", error);
 		wg_put_session_index(wg, wgs);
-		/* Initiation failed; toss packet waiting for it if any.  */
-		if ((m = atomic_swap_ptr(&wgp->wgp_pending, NULL)) != NULL)
-			m_freem(m);
+		m = atomic_swap_ptr(&wgp->wgp_pending, NULL);
+		membar_acquire(); /* matches membar_release in wgintr */
+		m_freem(m);
+		return;
 	}
 
-	return error;
+	WG_TRACE("init msg sent");
+	if (wgp->wgp_handshake_start_time == 0)
+		wgp->wgp_handshake_start_time = time_uptime;
+	callout_schedule(&wgp->wgp_handshake_timeout_timer,
+	    MIN(wg_rekey_timeout, (unsigned)(INT_MAX / hz)) * hz);
 }
 
 static void
@@ -1749,7 +2009,8 @@ wg_fill_msg_resp(struct wg_softc *wg, st
 
 	KASSERT(mutex_owned(wgp->wgp_lock));
 	KASSERT(wgs == wgp->wgp_session_unstable);
-	KASSERT(wgs->wgs_state == WGS_STATE_INIT_PASSIVE);
+	KASSERTMSG(wgs->wgs_state == WGS_STATE_UNKNOWN, "state=%d",
+	    wgs->wgs_state);
 
 	memcpy(hash, wgs->wgs_handshake_hash, sizeof(hash));
 	memcpy(ckey, wgs->wgs_chaining_key, sizeof(ckey));
@@ -1831,21 +2092,100 @@ wg_fill_msg_resp(struct wg_softc *wg, st
 	WG_DLOG("receiver=%x\n", wgs->wgs_remote_index);
 }
 
+/*
+ * wg_swap_sessions(wg, wgp)
+ *
+ *	Caller has just finished establishing the unstable session in
+ *	wg for peer wgp.  Publish it as the stable session, send queued
+ *	packets or keepalives as necessary to kick off the session,
+ *	move the previously stable session to unstable, and begin
+ *	destroying it.
+ */
 static void
-wg_swap_sessions(struct wg_peer *wgp)
+wg_swap_sessions(struct wg_softc *wg, struct wg_peer *wgp)
 {
 	struct wg_session *wgs, *wgs_prev;
+	struct mbuf *m;
 
 	KASSERT(mutex_owned(wgp->wgp_lock));
 
+	/*
+	 * Get the newly established session, to become the new
+	 * session.  Caller must have transitioned from INIT_ACTIVE to
+	 * INIT_PASSIVE or to ESTABLISHED already.  This will become
+	 * the stable session.
+	 */
 	wgs = wgp->wgp_session_unstable;
-	KASSERT(wgs->wgs_state == WGS_STATE_ESTABLISHED);
+	KASSERTMSG(wgs->wgs_state == WGS_STATE_ESTABLISHED, "state=%d",
+	    wgs->wgs_state);
 
+	/*
+	 * Get the stable session, which is either the previously
+	 * established session in the ESTABLISHED state, or has not
+	 * been established at all and is UNKNOWN.  This will become
+	 * the unstable session.
+	 */
 	wgs_prev = wgp->wgp_session_stable;
-	KASSERT(wgs_prev->wgs_state == WGS_STATE_ESTABLISHED ||
-	    wgs_prev->wgs_state == WGS_STATE_UNKNOWN);
+	KASSERTMSG((wgs_prev->wgs_state == WGS_STATE_ESTABLISHED ||
+		wgs_prev->wgs_state == WGS_STATE_UNKNOWN),
+	    "state=%d", wgs_prev->wgs_state);
+
+	/*
+	 * Publish the newly established session for the tx path to use
+	 * and make the other one the unstable session to handle
+	 * stragglers in the rx path and later be used for the next
+	 * session's handshake.
+	 */
 	atomic_store_release(&wgp->wgp_session_stable, wgs);
 	wgp->wgp_session_unstable = wgs_prev;
+
+	/*
+	 * Record the handshake time and reset the handshake state.
+	 */
+	getnanotime(&wgp->wgp_last_handshake_time);
+	wgp->wgp_handshake_start_time = 0;
+	wgp->wgp_last_sent_mac1_valid = false;
+	wgp->wgp_last_sent_cookie_valid = false;
+
+	/*
+	 * If we had a data packet queued up, send it.
+	 *
+	 * If not, but we're the initiator, send a keepalive message --
+	 * if we're the initiator we have to send something immediately
+	 * or else the responder will never answer.
+	 */
+	if ((m = atomic_swap_ptr(&wgp->wgp_pending, NULL)) != NULL) {
+		membar_acquire(); /* matches membar_release in wgintr */
+		wg_send_data_msg(wgp, wgs, m); /* consumes m */
+		m = NULL;
+	} else if (wgs->wgs_is_initiator) {
+		wg_send_keepalive_msg(wgp, wgs);
+	}
+
+	/*
+	 * If the previous stable session was established, begin to
+	 * destroy it.
+	 */
+	if (wgs_prev->wgs_state == WGS_STATE_ESTABLISHED) {
+		/*
+		 * Transition ESTABLISHED->DESTROYING.  The session
+		 * will remain usable for the data rx path to process
+		 * packets still in flight to us, but we won't use it
+		 * for data tx.
+		 */
+		WG_DLOG("session[L=%"PRIx32" R=%"PRIx32"]"
+		    " -> WGS_STATE_DESTROYING\n",
+		    wgs_prev->wgs_local_index, wgs_prev->wgs_remote_index);
+		atomic_store_relaxed(&wgs_prev->wgs_state,
+		    WGS_STATE_DESTROYING);
+	} else {
+		KASSERTMSG(wgs_prev->wgs_state == WGS_STATE_UNKNOWN,
+		    "state=%d", wgs_prev->wgs_state);
+		wgs_prev->wgs_local_index = 0; /* paranoia */
+		wgs_prev->wgs_remote_index = 0; /* paranoia */
+		wg_clear_states(wgs_prev); /* paranoia */
+		wgs_prev->wgs_state = WGS_STATE_UNKNOWN;
+	}
 }
 
 static void __noinline
@@ -1860,8 +2200,6 @@ wg_handle_msg_resp(struct wg_softc *wg, 
 	struct psref psref;
 	int error;
 	uint8_t mac1[WG_MAC_LEN];
-	struct wg_session *wgs_prev;
-	struct mbuf *m;
 
 	wg_algo_mac_mac1(mac1, sizeof(mac1),
 	    wg->wg_pubkey, sizeof(wg->wg_pubkey),
@@ -1907,13 +2245,13 @@ wg_handle_msg_resp(struct wg_softc *wg, 
 		uint8_t zero[WG_MAC_LEN] = {0};
 		if (consttime_memequal(wgmr->wgmr_mac2, zero, sizeof(zero))) {
 			WG_TRACE("sending a cookie message: no cookie included");
-			(void)wg_send_cookie_msg(wg, wgp, wgmr->wgmr_sender,
+			wg_send_cookie_msg(wg, wgp, wgmr->wgmr_sender,
 			    wgmr->wgmr_mac1, src);
 			goto out;
 		}
 		if (!wgp->wgp_last_sent_cookie_valid) {
 			WG_TRACE("sending a cookie message: no cookie sent ever");
-			(void)wg_send_cookie_msg(wg, wgp, wgmr->wgmr_sender,
+			wg_send_cookie_msg(wg, wgp, wgmr->wgmr_sender,
 			    wgmr->wgmr_mac1, src);
 			goto out;
 		}
@@ -1991,68 +2329,57 @@ wg_handle_msg_resp(struct wg_softc *wg, 
 	wgs->wgs_remote_index = wgmr->wgmr_sender;
 	WG_DLOG("receiver=%x\n", wgs->wgs_remote_index);
 
-	KASSERT(wgs->wgs_state == WGS_STATE_INIT_ACTIVE);
-	wgs->wgs_state = WGS_STATE_ESTABLISHED;
-	wgs->wgs_time_established = time_uptime;
+	/*
+	 * The packet is genuine.  Update the peer's endpoint if the
+	 * source address changed.
+	 *
+	 * XXX How to prevent DoS by replaying genuine packets from the
+	 * wrong source address?
+	 */
+	wg_update_endpoint_if_necessary(wgp, src);
+
+	KASSERTMSG(wgs->wgs_state == WGS_STATE_INIT_ACTIVE, "state=%d",
+	    wgs->wgs_state);
+	wgs->wgs_time_established = time_uptime32;
+	wg_schedule_session_dtor_timer(wgp);
 	wgs->wgs_time_last_data_sent = 0;
 	wgs->wgs_is_initiator = true;
+	WG_DLOG("session[L=%"PRIx32" R=%"PRIx32"]:"
+	    " calculate keys as initiator\n",
+	    wgs->wgs_local_index, wgs->wgs_remote_index);
 	wg_calculate_keys(wgs, true);
 	wg_clear_states(wgs);
-	WG_TRACE("WGS_STATE_ESTABLISHED");
-
-	callout_stop(&wgp->wgp_handshake_timeout_timer);
-
-	wg_swap_sessions(wgp);
-	KASSERT(wgs == wgp->wgp_session_stable);
-	wgs_prev = wgp->wgp_session_unstable;
-	getnanotime(&wgp->wgp_last_handshake_time);
-	wgp->wgp_handshake_start_time = 0;
-	wgp->wgp_last_sent_mac1_valid = false;
-	wgp->wgp_last_sent_cookie_valid = false;
-
-	wg_schedule_rekey_timer(wgp);
-
-	wg_update_endpoint_if_necessary(wgp, src);
 
 	/*
-	 * If we had a data packet queued up, send it; otherwise send a
-	 * keepalive message -- either way we have to send something
-	 * immediately or else the responder will never answer.
+	 * Session is ready to receive data now that we have received
+	 * the responder's response.
+	 *
+	 * Transition from INIT_ACTIVE to ESTABLISHED to publish it to
+	 * the data rx path, wg_handle_msg_data.
 	 */
-	if ((m = atomic_swap_ptr(&wgp->wgp_pending, NULL)) != NULL) {
-		kpreempt_disable();
-		const uint32_t h = curcpu()->ci_index; // pktq_rps_hash(m)
-		M_SETCTX(m, wgp);
-		if (__predict_false(!pktq_enqueue(wg_pktq, m, h))) {
-			WGLOG(LOG_ERR, "%s: pktq full, dropping\n",
-			    if_name(&wg->wg_if));
-			m_freem(m);
-		}
-		kpreempt_enable();
-	} else {
-		wg_send_keepalive_msg(wgp, wgs);
-	}
-
-	if (wgs_prev->wgs_state == WGS_STATE_ESTABLISHED) {
-		/* Wait for wg_get_stable_session to drain.  */
-		pserialize_perform(wgp->wgp_psz);
+	WG_DLOG("session[L=%"PRIx32" R=%"PRIx32" -> WGS_STATE_ESTABLISHED\n",
+	    wgs->wgs_local_index, wgs->wgs_remote_index);
+	atomic_store_release(&wgs->wgs_state, WGS_STATE_ESTABLISHED);
+	WG_TRACE("WGS_STATE_ESTABLISHED");
 
-		/* Transition ESTABLISHED->DESTROYING.  */
-		wgs_prev->wgs_state = WGS_STATE_DESTROYING;
+	callout_halt(&wgp->wgp_handshake_timeout_timer, NULL);
 
-		/* We can't destroy the old session immediately */
-		wg_schedule_session_dtor_timer(wgp);
-	} else {
-		KASSERTMSG(wgs_prev->wgs_state == WGS_STATE_UNKNOWN,
-		    "state=%d", wgs_prev->wgs_state);
-	}
+	/*
+	 * Session is ready to send data now that we have received the
+	 * responder's response.
+	 *
+	 * Swap the sessions to publish the new one as the stable
+	 * session for the data tx path, wg_output.
+	 */
+	wg_swap_sessions(wg, wgp);
+	KASSERT(wgs == wgp->wgp_session_stable);
 
 out:
 	mutex_exit(wgp->wgp_lock);
 	wg_put_session(wgs, &psref);
 }
 
-static int
+static void
 wg_send_handshake_msg_resp(struct wg_softc *wg, struct wg_peer *wgp,
     struct wg_session *wgs, const struct wg_msg_init *wgmi)
 {
@@ -2062,7 +2389,8 @@ wg_send_handshake_msg_resp(struct wg_sof
 
 	KASSERT(mutex_owned(wgp->wgp_lock));
 	KASSERT(wgs == wgp->wgp_session_unstable);
-	KASSERT(wgs->wgs_state == WGS_STATE_INIT_PASSIVE);
+	KASSERTMSG(wgs->wgs_state == WGS_STATE_UNKNOWN, "state=%d",
+	    wgs->wgs_state);
 
 	m = m_gethdr(M_WAIT, MT_DATA);
 	if (sizeof(*wgmr) > MHLEN) {
@@ -2073,15 +2401,18 @@ wg_send_handshake_msg_resp(struct wg_sof
 	wgmr = mtod(m, struct wg_msg_resp *);
 	wg_fill_msg_resp(wg, wgp, wgs, wgmr, wgmi);
 
-	error = wg->wg_ops->send_hs_msg(wgp, m);
-	if (error == 0)
-		WG_TRACE("resp msg sent");
-	return error;
+	error = wg->wg_ops->send_hs_msg(wgp, m); /* consumes m */
+	if (error) {
+		WG_DLOG("send_hs_msg failed, error=%d\n", error);
+		return;
+	}
+
+	WG_TRACE("resp msg sent");
 }
 
 static struct wg_peer *
 wg_lookup_peer_by_pubkey(struct wg_softc *wg,
-    const uint8_t pubkey[WG_STATIC_KEY_LEN], struct psref *psref)
+    const uint8_t pubkey[static WG_STATIC_KEY_LEN], struct psref *psref)
 {
 	struct wg_peer *wgp;
 
@@ -2097,7 +2428,7 @@ wg_lookup_peer_by_pubkey(struct wg_softc
 static void
 wg_fill_msg_cookie(struct wg_softc *wg, struct wg_peer *wgp,
     struct wg_msg_cookie *wgmc, const uint32_t sender,
-    const uint8_t mac1[WG_MAC_LEN], const struct sockaddr *src)
+    const uint8_t mac1[static WG_MAC_LEN], const struct sockaddr *src)
 {
 	uint8_t cookie[WG_COOKIE_LEN];
 	uint8_t key[WG_HASH_LEN];
@@ -2116,12 +2447,15 @@ wg_fill_msg_cookie(struct wg_softc *wg, 
 	 * "The secret variable, Rm, changes every two minutes to a
 	 * random value"
 	 */
-	if ((time_uptime - wgp->wgp_last_genrandval_time) > WG_RANDVAL_TIME) {
-		wgp->wgp_randval = cprng_strong32();
-		wgp->wgp_last_genrandval_time = time_uptime;
+	if ((time_uptime - wgp->wgp_last_cookiesecret_time) >
+	    WG_COOKIESECRET_TIME) {
+		cprng_strong(kern_cprng, wgp->wgp_cookiesecret,
+		    sizeof(wgp->wgp_cookiesecret), 0);
+		wgp->wgp_last_cookiesecret_time = time_uptime;
 	}
 
 	switch (src->sa_family) {
+#ifdef INET
 	case AF_INET: {
 		const struct sockaddr_in *sin = satocsin(src);
 		addrlen = sizeof(sin->sin_addr);
@@ -2129,6 +2463,7 @@ wg_fill_msg_cookie(struct wg_softc *wg, 
 		uh_sport = sin->sin_port;
 		break;
 	    }
+#endif
 #ifdef INET6
 	case AF_INET6: {
 		const struct sockaddr_in6 *sin6 = satocsin6(src);
@@ -2143,7 +2478,7 @@ wg_fill_msg_cookie(struct wg_softc *wg, 
 	}
 
 	wg_algo_mac(cookie, sizeof(cookie),
-	    (const uint8_t *)&wgp->wgp_randval, sizeof(wgp->wgp_randval),
+	    wgp->wgp_cookiesecret, sizeof(wgp->wgp_cookiesecret),
 	    addr, addrlen, (const uint8_t *)&uh_sport, sizeof(uh_sport));
 	wg_algo_mac_cookie(key, sizeof(key), wg->wg_pubkey,
 	    sizeof(wg->wg_pubkey));
@@ -2155,9 +2490,9 @@ wg_fill_msg_cookie(struct wg_softc *wg, 
 	wgp->wgp_last_sent_cookie_valid = true;
 }
 
-static int
+static void
 wg_send_cookie_msg(struct wg_softc *wg, struct wg_peer *wgp,
-    const uint32_t sender, const uint8_t mac1[WG_MAC_LEN],
+    const uint32_t sender, const uint8_t mac1[static WG_MAC_LEN],
     const struct sockaddr *src)
 {
 	int error;
@@ -2175,10 +2510,13 @@ wg_send_cookie_msg(struct wg_softc *wg, 
 	wgmc = mtod(m, struct wg_msg_cookie *);
 	wg_fill_msg_cookie(wg, wgp, wgmc, sender, mac1, src);
 
-	error = wg->wg_ops->send_hs_msg(wgp, m);
-	if (error == 0)
-		WG_TRACE("cookie msg sent");
-	return error;
+	error = wg->wg_ops->send_hs_msg(wgp, m); /* consumes m */
+	if (error) {
+		WG_DLOG("send_hs_msg failed, error=%d\n", error);
+		return;
+	}
+
+	WG_TRACE("cookie msg sent");
 }
 
 static bool
@@ -2279,8 +2617,9 @@ wg_lookup_session_by_index(struct wg_sof
 	int s = pserialize_read_enter();
 	wgs = thmap_get(wg->wg_sessions_byindex, &index, sizeof index);
 	if (wgs != NULL) {
-		KASSERT(atomic_load_relaxed(&wgs->wgs_state) !=
-		    WGS_STATE_UNKNOWN);
+		KASSERTMSG(index == wgs->wgs_local_index,
+		    "index=%"PRIx32" wgs->wgs_local_index=%"PRIx32,
+		    index, wgs->wgs_local_index);
 		psref_acquire(psref, &wgs->wgs_psref, wg_psref_class);
 	}
 	pserialize_read_exit(s);
@@ -2289,14 +2628,6 @@ wg_lookup_session_by_index(struct wg_sof
 }
 
 static void
-wg_schedule_rekey_timer(struct wg_peer *wgp)
-{
-	int timeout = MIN(wg_rekey_after_time, (unsigned)(INT_MAX / hz));
-
-	callout_schedule(&wgp->wgp_rekey_timer, timeout * hz);
-}
-
-static void
 wg_send_keepalive_msg(struct wg_peer *wgp, struct wg_session *wgs)
 {
 	struct mbuf *m;
@@ -2306,6 +2637,7 @@ wg_send_keepalive_msg(struct wg_peer *wg
 	 * "A keepalive message is simply a transport data message with
 	 *  a zero-length encapsulated encrypted inner-packet."
 	 */
+	WG_TRACE("");
 	m = m_gethdr(M_WAIT, MT_DATA);
 	wg_send_data_msg(wgp, wgs, m);
 }
@@ -2322,9 +2654,11 @@ wg_need_to_send_init_message(struct wg_s
 	 *  KEEPALIVE-TIMEOUT − REKEY-TIMEOUT) seconds old and it has
 	 *  not yet acted upon this event."
 	 */
-	return wgs->wgs_is_initiator && wgs->wgs_time_last_data_sent == 0 &&
-	    (time_uptime - wgs->wgs_time_established) >=
-	    (wg_reject_after_time - wg_keepalive_timeout - wg_rekey_timeout);
+	return wgs->wgs_is_initiator &&
+	    atomic_load_relaxed(&wgs->wgs_time_last_data_sent) == 0 &&
+	    (time_uptime32 - wgs->wgs_time_established >=
+		(wg_reject_after_time - wg_keepalive_timeout -
+		    wg_rekey_timeout));
 }
 
 static void
@@ -2367,16 +2701,21 @@ wg_validate_inner_packet(const char *pac
 	uint16_t packet_len;
 	const struct ip *ip;
 
-	if (__predict_false(decrypted_len < sizeof(struct ip)))
+	if (__predict_false(decrypted_len < sizeof(*ip))) {
+		WG_DLOG("decrypted_len=%zu < %zu\n", decrypted_len,
+		    sizeof(*ip));
 		return false;
+	}
 
 	ip = (const struct ip *)packet;
 	if (ip->ip_v == 4)
 		*af = AF_INET;
 	else if (ip->ip_v == 6)
 		*af = AF_INET6;
-	else
+	else {
+		WG_DLOG("ip_v=%d\n", ip->ip_v);
 		return false;
+	}
 
 	WG_DLOG("af=%d\n", *af);
 
@@ -2390,11 +2729,14 @@ wg_validate_inner_packet(const char *pac
 	case AF_INET6: {
 		const struct ip6_hdr *ip6;
 
-		if (__predict_false(decrypted_len < sizeof(struct ip6_hdr)))
+		if (__predict_false(decrypted_len < sizeof(*ip6))) {
+			WG_DLOG("decrypted_len=%zu < %zu\n", decrypted_len,
+			    sizeof(*ip6));
 			return false;
+		}
 
 		ip6 = (const struct ip6_hdr *)packet;
-		packet_len = sizeof(struct ip6_hdr) + ntohs(ip6->ip6_plen);
+		packet_len = sizeof(*ip6) + ntohs(ip6->ip6_plen);
 		break;
 	}
 #endif
@@ -2402,9 +2744,11 @@ wg_validate_inner_packet(const char *pac
 		return false;
 	}
 
-	WG_DLOG("packet_len=%u\n", packet_len);
-	if (packet_len > decrypted_len)
+	if (packet_len > decrypted_len) {
+		WG_DLOG("packet_len %u > decrypted_len %zu\n", packet_len,
+		    decrypted_len);
 		return false;
+	}
 
 	return true;
 }
@@ -2426,18 +2770,28 @@ wg_validate_route(struct wg_softc *wg, s
 	 *  decrypting it."
 	 */
 
-	if (af == AF_INET) {
+	switch (af) {
+#ifdef INET
+	case AF_INET: {
 		const struct ip *ip = (const struct ip *)packet;
 		struct sockaddr_in *sin = (struct sockaddr_in *)&ss;
 		sockaddr_in_init(sin, &ip->ip_src, 0);
 		sa = sintosa(sin);
+		break;
+	}
+#endif
 #ifdef INET6
-	} else {
+	case AF_INET6: {
 		const struct ip6_hdr *ip6 = (const struct ip6_hdr *)packet;
 		struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&ss;
 		sockaddr_in6_init(sin6, &ip6->ip6_src, 0, 0, 0);
 		sa = sin6tosa(sin6);
+		break;
+	}
 #endif
+	default:
+		__USE(ss);
+		return false;
 	}
 
 	wgp = wg_pick_peer_by_sa(wg, sa, &psref);
@@ -2455,6 +2809,7 @@ wg_session_dtor_timer(void *arg)
 
 	WG_TRACE("enter");
 
+	wg_schedule_session_dtor_timer(wgp);
 	wg_schedule_peer_task(wgp, WGP_TASK_DESTROY_PREV_SESSION);
 }
 
@@ -2462,8 +2817,19 @@ static void
 wg_schedule_session_dtor_timer(struct wg_peer *wgp)
 {
 
-	/* 1 second grace period */
-	callout_schedule(&wgp->wgp_session_dtor_timer, hz);
+	/*
+	 * If the periodic session destructor is already pending to
+	 * handle the previous session, that's fine -- leave it in
+	 * place; it will be scheduled again.
+	 */
+	if (callout_pending(&wgp->wgp_session_dtor_timer)) {
+		WG_DLOG("session dtor already pending\n");
+		return;
+	}
+
+	WG_DLOG("scheduling session dtor in %u secs\n", wg_reject_after_time);
+	callout_schedule(&wgp->wgp_session_dtor_timer,
+	    wg_reject_after_time*hz);
 }
 
 static bool
@@ -2527,6 +2893,7 @@ wg_handle_msg_data(struct wg_softc *wg, 
 	struct wg_session *wgs;
 	struct wg_peer *wgp;
 	int state;
+	uint32_t age;
 	size_t mlen;
 	struct psref psref;
 	int error, af;
@@ -2551,12 +2918,15 @@ wg_handle_msg_data(struct wg_softc *wg, 
 	 * We are only ready to handle data when in INIT_PASSIVE,
 	 * ESTABLISHED, or DESTROYING.  All transitions out of that
 	 * state dissociate the session index and drain psrefs.
+	 *
+	 * atomic_load_acquire matches atomic_store_release in either
+	 * wg_handle_msg_init or wg_handle_msg_resp.  (The transition
+	 * INIT_PASSIVE to ESTABLISHED in wg_task_establish_session
+	 * doesn't make a difference for this rx path.)
 	 */
-	state = atomic_load_relaxed(&wgs->wgs_state);
+	state = atomic_load_acquire(&wgs->wgs_state);
 	switch (state) {
 	case WGS_STATE_UNKNOWN:
-		panic("wg session %p in unknown state has session index %u",
-		    wgs, wgmd->wgmd_receiver);
 	case WGS_STATE_INIT_ACTIVE:
 		WG_TRACE("not yet ready for data");
 		goto out;
@@ -2567,6 +2937,16 @@ wg_handle_msg_data(struct wg_softc *wg, 
 	}
 
 	/*
+	 * Reject if the session is too old.
+	 */
+	age = time_uptime32 - wgs->wgs_time_established;
+	if (__predict_false(age >= wg_reject_after_time)) {
+		WG_DLOG("session %"PRIx32" too old, %"PRIu32" sec\n",
+		    wgmd->wgmd_receiver, age);
+	       goto out;
+	}
+
+	/*
 	 * Get the peer, for rate-limited logs (XXX MPSAFE, dtrace) and
 	 * to update the endpoint if authentication succeeds.
 	 */
@@ -2590,7 +2970,7 @@ wg_handle_msg_data(struct wg_softc *wg, 
 	mlen = m_length(m);
 	encrypted_len = mlen - sizeof(*wgmd);
 	if (encrypted_len < WG_AUTHTAG_LEN) {
-		WG_DLOG("Short encrypted_len: %lu\n", encrypted_len);
+		WG_DLOG("Short encrypted_len: %zu\n", encrypted_len);
 		goto out;
 	}
 	success = m_ensure_contig(&m, sizeof(*wgmd) + encrypted_len);
@@ -2628,7 +3008,7 @@ wg_handle_msg_data(struct wg_softc *wg, 
 	decrypted_buf = mtod(n, char *);
 
 	/* Decrypt and verify the packet.  */
-	WG_DLOG("mlen=%lu, encrypted_len=%lu\n", mlen, encrypted_len);
+	WG_DLOG("mlen=%zu, encrypted_len=%zu\n", mlen, encrypted_len);
 	error = wg_algo_aead_dec(decrypted_buf,
 	    encrypted_len - WG_AUTHTAG_LEN /* can be 0 */,
 	    wgs->wgs_tkey_recv, le64toh(wgmd->wgmd_counter), encrypted_buf,
@@ -2662,16 +3042,6 @@ wg_handle_msg_data(struct wg_softc *wg, 
 	wgmd = NULL;
 
 	/*
-	 * Validate the encapsulated packet header and get the address
-	 * family, or drop.
-	 */
-	ok = wg_validate_inner_packet(decrypted_buf, decrypted_len, &af);
-	if (!ok) {
-		m_freem(n);
-		goto out;
-	}
-
-	/*
 	 * The packet is genuine.  Update the peer's endpoint if the
 	 * source address changed.
 	 *
@@ -2680,6 +3050,16 @@ wg_handle_msg_data(struct wg_softc *wg, 
 	 */
 	wg_update_endpoint_if_necessary(wgp, src);
 
+	/*
+	 * Validate the encapsulated packet header and get the address
+	 * family, or drop.
+	 */
+	ok = wg_validate_inner_packet(decrypted_buf, decrypted_len, &af);
+	if (!ok) {
+		m_freem(n);
+		goto update_state;
+	}
+
 	/* Submit it into our network stack if routable.  */
 	ok = wg_validate_route(wg, wgp, af, decrypted_buf);
 	if (ok) {
@@ -2687,15 +3067,24 @@ wg_handle_msg_data(struct wg_softc *wg, 
 	} else {
 		char addrstr[INET6_ADDRSTRLEN];
 		memset(addrstr, 0, sizeof(addrstr));
-		if (af == AF_INET) {
+		switch (af) {
+#ifdef INET
+		case AF_INET: {
 			const struct ip *ip = (const struct ip *)decrypted_buf;
 			IN_PRINT(addrstr, &ip->ip_src);
+			break;
+		}
+#endif
 #ifdef INET6
-		} else if (af == AF_INET6) {
+		case AF_INET6: {
 			const struct ip6_hdr *ip6 =
 			    (const struct ip6_hdr *)decrypted_buf;
 			IN6_PRINT(addrstr, &ip6->ip6_src);
+			break;
+		}
 #endif
+		default:
+			panic("invalid af=%d", af);
 		}
 		WG_LOG_RATECHECK(&wgp->wgp_ppsratecheck, LOG_DEBUG,
 		    "%s: peer %s: invalid source address (%s)\n",
@@ -2708,6 +3097,7 @@ wg_handle_msg_data(struct wg_softc *wg, 
 	}
 	n = NULL;
 
+update_state:
 	/* Update the state machine if necessary.  */
 	if (__predict_false(state == WGS_STATE_INIT_PASSIVE)) {
 		/*
@@ -2727,11 +3117,13 @@ wg_handle_msg_data(struct wg_softc *wg, 
 		 *  itself to send back for KEEPALIVE-TIMEOUT seconds, it sends
 		 *  a keepalive message."
 		 */
-		WG_DLOG("time_uptime=%ju wgs_time_last_data_sent=%ju\n",
-		    (uintmax_t)time_uptime,
-		    (uintmax_t)wgs->wgs_time_last_data_sent);
-		if ((time_uptime - wgs->wgs_time_last_data_sent) >=
-		    wg_keepalive_timeout) {
+		const uint32_t now = time_uptime32;
+		const uint32_t time_last_data_sent =
+		    atomic_load_relaxed(&wgs->wgs_time_last_data_sent);
+		WG_DLOG("time_uptime32=%"PRIu32
+		    " wgs_time_last_data_sent=%"PRIu32"\n",
+		    now, time_last_data_sent);
+		if ((now - time_last_data_sent) >= wg_keepalive_timeout) {
 			WG_TRACE("Schedule sending keepalive message");
 			/*
 			 * We can't send a keepalive message here to avoid
@@ -2744,8 +3136,7 @@ wg_handle_msg_data(struct wg_softc *wg, 
 	}
 out:
 	wg_put_session(wgs, &psref);
-	if (m != NULL)
-		m_freem(m);
+	m_freem(m);
 	if (free_encrypted_buf)
 		kmem_intr_free(encrypted_buf, encrypted_len);
 }
@@ -2778,6 +3169,15 @@ wg_handle_msg_cookie(struct wg_softc *wg
 		goto out;
 	}
 
+	/*
+	 * wgp_last_sent_mac1_valid is only set to true when we are
+	 * transitioning to INIT_ACTIVE or INIT_PASSIVE, and always
+	 * cleared on transition out of them.
+	 */
+	KASSERTMSG((wgs->wgs_state == WGS_STATE_INIT_ACTIVE ||
+		wgs->wgs_state == WGS_STATE_INIT_PASSIVE),
+	    "state=%d", wgs->wgs_state);
+
 	/* Decrypt the cookie and store it for later handshake retry.  */
 	wg_algo_mac_cookie(key, sizeof(key), wgp->wgp_pubkey,
 	    sizeof(wgp->wgp_pubkey));
@@ -2845,7 +3245,7 @@ wg_validate_msg_header(struct wg_softc *
 
 	/* Verify the mbuf chain is long enough for this type of message.  */
 	if (__predict_false(mbuflen < msglen)) {
-		WG_DLOG("Invalid msg size: mbuflen=%lu type=%u\n", mbuflen,
+		WG_DLOG("Invalid msg size: mbuflen=%zu type=%u\n", mbuflen,
 		    le32toh(wgm.wgm_type));
 		goto error;
 	}
@@ -2960,16 +3360,22 @@ wg_task_send_init_message(struct wg_soft
 		return;
 	}
 
+	/*
+	 * If we already have an established session, there's no need
+	 * to initiate a new one -- unless the rekey-after-time or
+	 * rekey-after-messages limits have passed.
+	 */
 	wgs = wgp->wgp_session_stable;
-	if (wgs->wgs_state == WGS_STATE_UNKNOWN) {
-		/* XXX What if the unstable session is already INIT_ACTIVE?  */
-		wg_send_handshake_msg_init(wg, wgp);
-	} else {
-		/* rekey */
-		wgs = wgp->wgp_session_unstable;
-		if (wgs->wgs_state != WGS_STATE_INIT_ACTIVE)
-			wg_send_handshake_msg_init(wg, wgp);
-	}
+	if (wgs->wgs_state == WGS_STATE_ESTABLISHED &&
+	    !atomic_load_relaxed(&wgs->wgs_force_rekey))
+		return;
+
+	/*
+	 * Ensure we're initiating a new session.  If the unstable
+	 * session is already INIT_ACTIVE or INIT_PASSIVE, this does
+	 * nothing.
+	 */
+	wg_send_handshake_msg_init(wg, wgp);
 }
 
 static void
@@ -3014,8 +3420,7 @@ wg_task_retry_handshake(struct wg_softc 
 static void
 wg_task_establish_session(struct wg_softc *wg, struct wg_peer *wgp)
 {
-	struct wg_session *wgs, *wgs_prev;
-	struct mbuf *m;
+	struct wg_session *wgs;
 
 	KASSERT(mutex_owned(wgp->wgp_lock));
 
@@ -3024,48 +3429,34 @@ wg_task_establish_session(struct wg_soft
 		/* XXX Can this happen?  */
 		return;
 
-	wgs->wgs_state = WGS_STATE_ESTABLISHED;
-	wgs->wgs_time_established = time_uptime;
 	wgs->wgs_time_last_data_sent = 0;
 	wgs->wgs_is_initiator = false;
+
+	/*
+	 * Session was already ready to receive data.  Transition from
+	 * INIT_PASSIVE to ESTABLISHED just so we can swap the
+	 * sessions.
+	 *
+	 * atomic_store_relaxed because this doesn't affect the data rx
+	 * path, wg_handle_msg_data -- changing from INIT_PASSIVE to
+	 * ESTABLISHED makes no difference to the data rx path, and the
+	 * transition to INIT_PASSIVE with store-release already
+	 * published the state needed by the data rx path.
+	 */
+	WG_DLOG("session[L=%"PRIx32" R=%"PRIx32"] -> WGS_STATE_ESTABLISHED\n",
+	    wgs->wgs_local_index, wgs->wgs_remote_index);
+	atomic_store_relaxed(&wgs->wgs_state, WGS_STATE_ESTABLISHED);
 	WG_TRACE("WGS_STATE_ESTABLISHED");
 
-	wg_swap_sessions(wgp);
+	/*
+	 * Session is ready to send data too now that we have received
+	 * the peer initiator's first data packet.
+	 *
+	 * Swap the sessions to publish the new one as the stable
+	 * session for the data tx path, wg_output.
+	 */
+	wg_swap_sessions(wg, wgp);
 	KASSERT(wgs == wgp->wgp_session_stable);
-	wgs_prev = wgp->wgp_session_unstable;
-	getnanotime(&wgp->wgp_last_handshake_time);
-	wgp->wgp_handshake_start_time = 0;
-	wgp->wgp_last_sent_mac1_valid = false;
-	wgp->wgp_last_sent_cookie_valid = false;
-
-	/* If we had a data packet queued up, send it.  */
-	if ((m = atomic_swap_ptr(&wgp->wgp_pending, NULL)) != NULL) {
-		kpreempt_disable();
-		const uint32_t h = curcpu()->ci_index; // pktq_rps_hash(m)
-		M_SETCTX(m, wgp);
-		if (__predict_false(!pktq_enqueue(wg_pktq, m, h))) {
-			WGLOG(LOG_ERR, "%s: pktq full, dropping\n",
-			    if_name(&wg->wg_if));
-			m_freem(m);
-		}
-		kpreempt_enable();
-	}
-
-	if (wgs_prev->wgs_state == WGS_STATE_ESTABLISHED) {
-		/* Wait for wg_get_stable_session to drain.  */
-		pserialize_perform(wgp->wgp_psz);
-
-		/* Transition ESTABLISHED->DESTROYING.  */
-		wgs_prev->wgs_state = WGS_STATE_DESTROYING;
-
-		/* We can't destroy the old session immediately */
-		wg_schedule_session_dtor_timer(wgp);
-	} else {
-		KASSERTMSG(wgs_prev->wgs_state == WGS_STATE_UNKNOWN,
-		    "state=%d", wgs_prev->wgs_state);
-		wg_clear_states(wgs_prev);
-		wgs_prev->wgs_state = WGS_STATE_UNKNOWN;
-	}
 }
 
 static void
@@ -3108,15 +3499,59 @@ static void
 wg_task_destroy_prev_session(struct wg_softc *wg, struct wg_peer *wgp)
 {
 	struct wg_session *wgs;
+	uint32_t age;
 
 	WG_TRACE("WGP_TASK_DESTROY_PREV_SESSION");
 
 	KASSERT(mutex_owned(wgp->wgp_lock));
 
+	/*
+	 * If theres's any previous unstable session, i.e., one that
+	 * was ESTABLISHED and is now DESTROYING, older than
+	 * reject-after-time, destroy it.  Upcoming sessions are still
+	 * in INIT_ACTIVE or INIT_PASSIVE -- we don't touch those here.
+	 */
 	wgs = wgp->wgp_session_unstable;
-	if (wgs->wgs_state == WGS_STATE_DESTROYING) {
+	KASSERT(wgs->wgs_state != WGS_STATE_ESTABLISHED);
+	if (wgs->wgs_state == WGS_STATE_DESTROYING &&
+	    ((age = (time_uptime32 - wgs->wgs_time_established)) >=
+		wg_reject_after_time)) {
+		WG_DLOG("destroying past session %"PRIu32" sec old\n", age);
 		wg_put_session_index(wg, wgs);
+		KASSERTMSG(wgs->wgs_state == WGS_STATE_UNKNOWN, "state=%d",
+		    wgs->wgs_state);
 	}
+
+	/*
+	 * If theres's any ESTABLISHED stable session older than
+	 * reject-after-time, destroy it.  (The stable session can also
+	 * be in UNKNOWN state -- nothing to do in that case)
+	 */
+	wgs = wgp->wgp_session_stable;
+	KASSERT(wgs->wgs_state != WGS_STATE_INIT_ACTIVE);
+	KASSERT(wgs->wgs_state != WGS_STATE_INIT_PASSIVE);
+	KASSERT(wgs->wgs_state != WGS_STATE_DESTROYING);
+	if (wgs->wgs_state == WGS_STATE_ESTABLISHED &&
+	    ((age = (time_uptime32 - wgs->wgs_time_established)) >=
+		wg_reject_after_time)) {
+		WG_DLOG("destroying current session %"PRIu32" sec old\n", age);
+		atomic_store_relaxed(&wgs->wgs_state, WGS_STATE_DESTROYING);
+		wg_put_session_index(wg, wgs);
+		KASSERTMSG(wgs->wgs_state == WGS_STATE_UNKNOWN, "state=%d",
+		    wgs->wgs_state);
+	}
+
+	/*
+	 * If there's no sessions left, no need to have the timer run
+	 * until the next time around -- halt it.
+	 *
+	 * It is only ever scheduled with wgp_lock held or in the
+	 * callout itself, and callout_halt prevents rescheudling
+	 * itself, so this never races with rescheduling.
+	 */
+	if (wgp->wgp_session_unstable->wgs_state == WGS_STATE_UNKNOWN &&
+	    wgp->wgp_session_stable->wgs_state == WGS_STATE_UNKNOWN)
+		callout_halt(&wgp->wgp_session_dtor_timer, NULL);
 }
 
 static void
@@ -3176,12 +3611,13 @@ wg_job(struct threadpool_job *job)
 static int
 wg_bind_port(struct wg_softc *wg, const uint16_t port)
 {
-	int error;
+	int error = 0;
 	uint16_t old_port = wg->wg_listen_port;
 
 	if (port != 0 && old_port == port)
 		return 0;
 
+#ifdef INET
 	struct sockaddr_in _sin, *sin = &_sin;
 	sin->sin_len = sizeof(*sin);
 	sin->sin_family = AF_INET;
@@ -3189,8 +3625,9 @@ wg_bind_port(struct wg_softc *wg, const 
 	sin->sin_port = htons(port);
 
 	error = sobind(wg->wg_so4, sintosa(sin), curlwp);
-	if (error != 0)
+	if (error)
 		return error;
+#endif
 
 #ifdef INET6
 	struct sockaddr_in6 _sin6, *sin6 = &_sin6;
@@ -3200,13 +3637,13 @@ wg_bind_port(struct wg_softc *wg, const 
 	sin6->sin6_port = htons(port);
 
 	error = sobind(wg->wg_so6, sin6tosa(sin6), curlwp);
-	if (error != 0)
+	if (error)
 		return error;
 #endif
 
 	wg->wg_listen_port = port;
 
-	return 0;
+	return error;
 }
 
 static void
@@ -3225,6 +3662,24 @@ wg_so_upcall(struct socket *so, void *co
 	mutex_exit(wg->wg_intr_lock);
 }
 
+/*
+ * wg_overudp_cb(&m, offset, so, src, arg)
+ *
+ *	Callback for incoming UDP packets in high-priority
+ *	packet-processing path.
+ *
+ *	Three cases:
+ *
+ *	- Data packet.  Consumed here for high-priority handling.
+ *	  => Returns 1 and takes ownership of m.
+ *
+ *	- Handshake packet.  Defer to thread context via so_receive in
+ *	  wg_receive_packets.
+ *	  => Returns 0 and leaves caller with ownership of m.
+ *
+ *	- Invalid.  Dropped on the floor and freed.
+ *	  => Returns -1 and takes ownership of m (frees m).
+ */
 static int
 wg_overudp_cb(struct mbuf **mp, int offset, struct socket *so,
     struct sockaddr *src, void *arg)
@@ -3240,7 +3695,8 @@ wg_overudp_cb(struct mbuf **mp, int offs
 	if (__predict_false(m_length(m) - offset < sizeof(struct wg_msg))) {
 		/* drop on the floor */
 		m_freem(m);
-		return -1;
+		*mp = NULL;
+		return -1;	/* dropped */
 	}
 
 	/*
@@ -3251,9 +3707,10 @@ wg_overudp_cb(struct mbuf **mp, int offs
 	WG_DLOG("type=%d\n", le32toh(wgm.wgm_type));
 
 	/*
-	 * Handle DATA packets promptly as they arrive.  Other packets
-	 * may require expensive public-key crypto and are not as
-	 * sensitive to latency, so defer them to the worker thread.
+	 * Handle DATA packets promptly as they arrive, if they are in
+	 * an active session.  Other packets may require expensive
+	 * public-key crypto and are not as sensitive to latency, so
+	 * defer them to the worker thread.
 	 */
 	switch (le32toh(wgm.wgm_type)) {
 	case WG_MSG_TYPE_DATA:
@@ -3261,21 +3718,24 @@ wg_overudp_cb(struct mbuf **mp, int offs
 		m_adj(m, offset);
 		if (__predict_false(m->m_len < sizeof(struct wg_msg_data))) {
 			m = m_pullup(m, sizeof(struct wg_msg_data));
-			if (m == NULL)
-				return -1;
+			if (m == NULL) {
+				*mp = NULL;
+				return -1; /* dropped */
+			}
 		}
 		wg_handle_msg_data(wg, m, src);
 		*mp = NULL;
-		return 1;
+		return 1;	/* consumed */
 	case WG_MSG_TYPE_INIT:
 	case WG_MSG_TYPE_RESP:
 	case WG_MSG_TYPE_COOKIE:
 		/* pass through to so_receive in wg_receive_packets */
-		return 0;
+		return 0;	/* passthrough */
 	default:
 		/* drop on the floor */
 		m_freem(m);
-		return -1;
+		*mp = NULL;
+		return -1;	/* dropped */
 	}
 }
 
@@ -3309,11 +3769,11 @@ wg_session_hit_limits(struct wg_session 
 	 * [W] 6.2: Transport Message Limits
 	 * "After REJECT-AFTER-MESSAGES transport data messages or after the
 	 *  current secure session is REJECT-AFTER-TIME seconds old, whichever
-	 *  comes first, WireGuard will refuse to send any more transport data
-	 *  messages using the current secure session, ..."
+	 *  comes first, WireGuard will refuse to send or receive any more
+	 *  transport data messages using the current secure session, ..."
 	 */
-	KASSERT(wgs->wgs_time_established != 0);
-	if ((time_uptime - wgs->wgs_time_established) > wg_reject_after_time) {
+	KASSERT(wgs->wgs_time_established != 0 || time_uptime > UINT32_MAX);
+	if (time_uptime32 - wgs->wgs_time_established > wg_reject_after_time) {
 		WG_DLOG("The session hits REJECT_AFTER_TIME\n");
 		return true;
 	} else if (wg_session_get_send_counter(wgs) >
@@ -3336,39 +3796,67 @@ wgintr(void *cookie)
 	while ((m = pktq_dequeue(wg_pktq)) != NULL) {
 		wgp = M_GETCTX(m, struct wg_peer *);
 		if ((wgs = wg_get_stable_session(wgp, &psref)) == NULL) {
+			/*
+			 * No established session.  If we're the first
+			 * to try sending data, schedule a handshake
+			 * and queue the packet for when the handshake
+			 * is done; otherwise just drop the packet and
+			 * let the ongoing handshake attempt continue.
+			 * We could queue more data packets but it's
+			 * not clear that's worthwhile.
+			 */
 			WG_TRACE("no stable session");
-			wg_schedule_peer_task(wgp, WGP_TASK_SEND_INIT_MESSAGE);
+			membar_release();
+			if ((m = atomic_swap_ptr(&wgp->wgp_pending, m)) ==
+			    NULL) {
+				WG_TRACE("queued first packet;"
+				    " init handshake");
+				wg_schedule_peer_task(wgp,
+				    WGP_TASK_SEND_INIT_MESSAGE);
+			} else {
+				membar_acquire();
+				WG_TRACE("first packet already queued,"
+				    " dropping");
+			}
 			goto next0;
 		}
 		if (__predict_false(wg_session_hit_limits(wgs))) {
 			WG_TRACE("stable session hit limits");
-			wg_schedule_peer_task(wgp, WGP_TASK_SEND_INIT_MESSAGE);
+			membar_release();
+			if ((m = atomic_swap_ptr(&wgp->wgp_pending, m)) ==
+			    NULL) {
+				WG_TRACE("queued first packet in a while;"
+				    " reinit handshake");
+				atomic_store_relaxed(&wgs->wgs_force_rekey,
+				    true);
+				wg_schedule_peer_task(wgp,
+				    WGP_TASK_SEND_INIT_MESSAGE);
+			} else {
+				membar_acquire();
+				WG_TRACE("first packet in already queued,"
+				    " dropping");
+			}
 			goto next1;
 		}
 		wg_send_data_msg(wgp, wgs, m);
 		m = NULL;	/* consumed */
 next1:		wg_put_session(wgs, &psref);
-next0:		if (m)
-			m_freem(m);
+next0:		m_freem(m);
 		/* XXX Yield to avoid userland starvation?  */
 	}
 }
 
 static void
-wg_rekey_timer(void *arg)
-{
-	struct wg_peer *wgp = arg;
-
-	wg_schedule_peer_task(wgp, WGP_TASK_SEND_INIT_MESSAGE);
-}
-
-static void
 wg_purge_pending_packets(struct wg_peer *wgp)
 {
 	struct mbuf *m;
 
-	if ((m = atomic_swap_ptr(&wgp->wgp_pending, NULL)) != NULL)
-		m_freem(m);
+	m = atomic_swap_ptr(&wgp->wgp_pending, NULL);
+	membar_acquire();     /* matches membar_release in wgintr */
+	m_freem(m);
+#ifdef ALTQ
+	wg_start(&wgp->wgp_sc->wg_if);
+#endif
 	pktq_barrier(wg_pktq);
 }
 
@@ -3390,8 +3878,6 @@ wg_alloc_peer(struct wg_softc *wg)
 	wgp = kmem_zalloc(sizeof(*wgp), KM_SLEEP);
 
 	wgp->wgp_sc = wg;
-	callout_init(&wgp->wgp_rekey_timer, CALLOUT_MPSAFE);
-	callout_setfunc(&wgp->wgp_rekey_timer, wg_rekey_timer, wgp);
 	callout_init(&wgp->wgp_handshake_timeout_timer, CALLOUT_MPSAFE);
 	callout_setfunc(&wgp->wgp_handshake_timeout_timer,
 	    wg_handshake_timeout_timer, wgp);
@@ -3469,7 +3955,6 @@ wg_destroy_peer(struct wg_peer *wgp)
 	wg_purge_pending_packets(wgp);
 
 	/* Halt all packet processing and timeouts.  */
-	callout_halt(&wgp->wgp_rekey_timer, NULL);
 	callout_halt(&wgp->wgp_handshake_timeout_timer, NULL);
 	callout_halt(&wgp->wgp_session_dtor_timer, NULL);
 
@@ -3687,8 +4172,9 @@ wg_clone_create(struct if_clone *ifc, in
 	return 0;
 
 fail4: __unused
+	wg_destroy_all_peers(wg);
 	wg_if_detach(wg);
-fail3:	wg_destroy_all_peers(wg);
+fail3:
 #ifdef INET6
 	solock(wg->wg_so6);
 	wg->wg_so6->so_rcv.sb_flags &= ~SB_UPCALL;
@@ -3740,8 +4226,8 @@ wg_clone_destroy(struct ifnet *ifp)
 	}
 #endif
 
-	wg_if_detach(wg);
 	wg_destroy_all_peers(wg);
+	wg_if_detach(wg);
 #ifdef INET6
 	solock(wg->wg_so6);
 	wg->wg_so6->so_rcv.sb_flags &= ~SB_UPCALL;
@@ -3836,8 +4322,7 @@ wg_output(struct ifnet *ifp, struct mbuf
 {
 	struct wg_softc *wg = ifp->if_softc;
 	struct wg_peer *wgp = NULL;
-	struct wg_session *wgs = NULL;
-	struct psref wgp_psref, wgs_psref;
+	struct psref wgp_psref;
 	int bound;
 	int error;
 
@@ -3874,28 +4359,7 @@ wg_output(struct ifnet *ifp, struct mbuf
 	m->m_pkthdr.csum_flags = 0;
 	m->m_pkthdr.csum_data = 0;
 
-	/* Check whether there's an established session.  */
-	wgs = wg_get_stable_session(wgp, &wgs_psref);
-	if (wgs == NULL) {
-		/*
-		 * No established session.  If we're the first to try
-		 * sending data, schedule a handshake and queue the
-		 * packet for when the handshake is done; otherwise
-		 * just drop the packet and let the ongoing handshake
-		 * attempt continue.  We could queue more data packets
-		 * but it's not clear that's worthwhile.
-		 */
-		if (atomic_cas_ptr(&wgp->wgp_pending, NULL, m) == NULL) {
-			m = NULL; /* consume */
-			WG_TRACE("queued first packet; init handshake");
-			wg_schedule_peer_task(wgp, WGP_TASK_SEND_INIT_MESSAGE);
-		} else {
-			WG_TRACE("first packet already queued, dropping");
-		}
-		goto out1;
-	}
-
-	/* There's an established session.  Toss it in the queue.  */
+	/* Toss it in the queue.  */
 #ifdef ALTQ
 	if (altq) {
 		mutex_enter(ifp->if_snd.ifq_lock);
@@ -3907,7 +4371,7 @@ wg_output(struct ifnet *ifp, struct mbuf
 		mutex_exit(ifp->if_snd.ifq_lock);
 		if (m == NULL) {
 			wg_start(ifp);
-			goto out2;
+			goto out1;
 		}
 	}
 #endif
@@ -3918,19 +4382,17 @@ wg_output(struct ifnet *ifp, struct mbuf
 		WGLOG(LOG_ERR, "%s: pktq full, dropping\n",
 		    if_name(&wg->wg_if));
 		error = ENOBUFS;
-		goto out3;
+		goto out2;
 	}
 	m = NULL;		/* consumed */
 	error = 0;
-out3:	kpreempt_enable();
+out2:	kpreempt_enable();
 
 #ifdef ALTQ
-out2:
+out1:
 #endif
-	wg_put_session(wgs, &wgs_psref);
-out1:	wg_put_peer(wgp, &wgp_psref);
-out0:	if (m)
-		m_freem(m);
+	wg_put_peer(wgp, &wgp_psref);
+out0:	m_freem(m);
 	curlwp_bindx(bound);
 	return error;
 }
@@ -3946,16 +4408,21 @@ wg_send_udp(struct wg_peer *wgp, struct 
 	wgsa = wg_get_endpoint_sa(wgp, &psref);
 	so = wg_get_so_by_peer(wgp, wgsa);
 	solock(so);
-	if (wgsatosa(wgsa)->sa_family == AF_INET) {
+	switch (wgsatosa(wgsa)->sa_family) {
+#ifdef INET
+	case AF_INET:
 		error = udp_send(so, m, wgsatosa(wgsa), NULL, curlwp);
-	} else {
+		break;
+#endif
 #ifdef INET6
+	case AF_INET6:
 		error = udp6_output(sotoinpcb(so), m, wgsatosin6(wgsa),
 		    NULL, curlwp);
-#else
+		break;
+#endif
+	default:
 		m_freem(m);
 		error = EPFNOSUPPORT;
-#endif
 	}
 	sounlock(so);
 	wg_put_sa(wgp, wgsa, &psref);
@@ -3988,9 +4455,8 @@ wg_get_mbuf(size_t leading_len, size_t l
 	return m;
 }
 
-static int
-wg_send_data_msg(struct wg_peer *wgp, struct wg_session *wgs,
-    struct mbuf *m)
+static void
+wg_send_data_msg(struct wg_peer *wgp, struct wg_session *wgs, struct mbuf *m)
 {
 	struct wg_softc *wg = wgp->wgp_sc;
 	int error;
@@ -4006,7 +4472,7 @@ wg_send_data_msg(struct wg_peer *wgp, st
 	inner_len = mlen;
 	padded_len = roundup(mlen, 16);
 	encrypted_len = padded_len + WG_AUTHTAG_LEN;
-	WG_DLOG("inner=%lu, padded=%lu, encrypted_len=%lu\n",
+	WG_DLOG("inner=%zu, padded=%zu, encrypted_len=%zu\n",
 	    inner_len, padded_len, encrypted_len);
 	if (mlen != 0) {
 		bool success;
@@ -4017,7 +4483,7 @@ wg_send_data_msg(struct wg_peer *wgp, st
 			padded_buf = kmem_intr_alloc(padded_len, KM_NOSLEEP);
 			if (padded_buf == NULL) {
 				error = ENOBUFS;
-				goto end;
+				goto out;
 			}
 			free_padded_buf = true;
 			m_copydata(m, 0, mlen, padded_buf);
@@ -4028,52 +4494,86 @@ wg_send_data_msg(struct wg_peer *wgp, st
 	n = wg_get_mbuf(leading_len, sizeof(*wgmd) + encrypted_len);
 	if (n == NULL) {
 		error = ENOBUFS;
-		goto end;
+		goto out;
 	}
 	KASSERT(n->m_len >= sizeof(*wgmd));
 	wgmd = mtod(n, struct wg_msg_data *);
 	wg_fill_msg_data(wg, wgp, wgs, wgmd);
+
 	/* [W] 5.4.6: AEAD(Tm^send, Nm^send, P, e) */
 	wg_algo_aead_enc((char *)wgmd + sizeof(*wgmd), encrypted_len,
 	    wgs->wgs_tkey_send, le64toh(wgmd->wgmd_counter),
 	    padded_buf, padded_len,
 	    NULL, 0);
 
-	error = wg->wg_ops->send_data_msg(wgp, n);
-	if (error == 0) {
-		struct ifnet *ifp = &wg->wg_if;
-		if_statadd(ifp, if_obytes, mlen);
-		if_statinc(ifp, if_opackets);
-		if (wgs->wgs_is_initiator &&
-		    wgs->wgs_time_last_data_sent == 0) {
-			/*
-			 * [W] 6.2 Transport Message Limits
-			 * "if a peer is the initiator of a current secure
-			 *  session, WireGuard will send a handshake initiation
-			 *  message to begin a new secure session if, after
-			 *  transmitting a transport data message, the current
-			 *  secure session is REKEY-AFTER-TIME seconds old,"
-			 */
-			wg_schedule_rekey_timer(wgp);
-		}
-		wgs->wgs_time_last_data_sent = time_uptime;
-		if (wg_session_get_send_counter(wgs) >=
-		    wg_rekey_after_messages) {
-			/*
-			 * [W] 6.2 Transport Message Limits
-			 * "WireGuard will try to create a new session, by
-			 *  sending a handshake initiation message (section
-			 *  5.4.2), after it has sent REKEY-AFTER-MESSAGES
-			 *  transport data messages..."
-			 */
-			wg_schedule_peer_task(wgp, WGP_TASK_SEND_INIT_MESSAGE);
-		}
+	error = wg->wg_ops->send_data_msg(wgp, n); /* consumes n */
+	if (error) {
+		WG_DLOG("send_data_msg failed, error=%d\n", error);
+		goto out;
 	}
-end:
-	m_freem(m);
+
+	/*
+	 * Packet was sent out -- count it in the interface statistics.
+	 */
+	if_statadd(&wg->wg_if, if_obytes, mlen);
+	if_statinc(&wg->wg_if, if_opackets);
+
+	/*
+	 * Record when we last sent data, for determining when we need
+	 * to send a passive keepalive.
+	 *
+	 * Other logic assumes that wgs_time_last_data_sent is zero iff
+	 * we have never sent data on this session.  Early at boot, if
+	 * wg(4) starts operating within <1sec, or after 136 years of
+	 * uptime, we may observe time_uptime32 = 0.  In that case,
+	 * pretend we observed 1 instead.  That way, we correctly
+	 * indicate we have sent data on this session; the only logic
+	 * this might adversely affect is the keepalive timeout
+	 * detection, which might spuriously send a keepalive during
+	 * one second every 136 years.  All of this is very silly, of
+	 * course, but the cost to guaranteeing wgs_time_last_data_sent
+	 * is nonzero is negligible here.
+	 */
+	const uint32_t now = time_uptime32;
+	atomic_store_relaxed(&wgs->wgs_time_last_data_sent, MAX(now, 1));
+
+	/*
+	 * Check rekey-after-time.
+	 */
+	if (wgs->wgs_is_initiator &&
+	    now - wgs->wgs_time_established >= wg_rekey_after_time) {
+		/*
+		 * [W] 6.2 Transport Message Limits
+		 * "if a peer is the initiator of a current secure
+		 *  session, WireGuard will send a handshake initiation
+		 *  message to begin a new secure session if, after
+		 *  transmitting a transport data message, the current
+		 *  secure session is REKEY-AFTER-TIME seconds old,"
+		 */
+		WG_TRACE("rekey after time");
+		atomic_store_relaxed(&wgs->wgs_force_rekey, true);
+		wg_schedule_peer_task(wgp, WGP_TASK_SEND_INIT_MESSAGE);
+	}
+
+	/*
+	 * Check rekey-after-messages.
+	 */
+	if (wg_session_get_send_counter(wgs) >= wg_rekey_after_messages) {
+		/*
+		 * [W] 6.2 Transport Message Limits
+		 * "WireGuard will try to create a new session, by
+		 *  sending a handshake initiation message (section
+		 *  5.4.2), after it has sent REKEY-AFTER-MESSAGES
+		 *  transport data messages..."
+		 */
+		WG_TRACE("rekey after messages");
+		atomic_store_relaxed(&wgs->wgs_force_rekey, true);
+		wg_schedule_peer_task(wgp, WGP_TASK_SEND_INIT_MESSAGE);
+	}
+
+out:	m_freem(m);
 	if (free_padded_buf)
 		kmem_intr_free(padded_buf, padded_len);
-	return error;
 }
 
 static void
@@ -4092,9 +4592,11 @@ wg_input(struct ifnet *ifp, struct mbuf 
 	bpf_mtap_af(ifp, af, m, BPF_D_IN);
 
 	switch (af) {
+#ifdef INET
 	case AF_INET:
 		pktq = ip_pktq;
 		break;
+#endif
 #ifdef INET6
 	case AF_INET6:
 		pktq = ip6_pktq;
@@ -4116,8 +4618,8 @@ wg_input(struct ifnet *ifp, struct mbuf 
 }
 
 static void
-wg_calc_pubkey(uint8_t pubkey[WG_STATIC_KEY_LEN],
-    const uint8_t privkey[WG_STATIC_KEY_LEN])
+wg_calc_pubkey(uint8_t pubkey[static WG_STATIC_KEY_LEN],
+    const uint8_t privkey[static WG_STATIC_KEY_LEN])
 {
 
 	crypto_scalarmult_base(pubkey, privkey);
@@ -4167,12 +4669,12 @@ wg_handle_prop_peer(struct wg_softc *wg,
 		goto out;
 	}
 #ifdef WG_DEBUG_DUMP
-    {
-	char *hex = gethexdump(pubkey, pubkey_len);
-	log(LOG_DEBUG, "pubkey=%p, pubkey_len=%lu\n%s\n",
-	    pubkey, pubkey_len, hex);
-	puthexdump(hex, pubkey, pubkey_len);
-    }
+        if (wg_debug & WG_DEBUG_FLAGS_DUMP) {
+		char *hex = gethexdump(pubkey, pubkey_len);
+		log(LOG_DEBUG, "pubkey=%p, pubkey_len=%zu\n%s\n",
+		    pubkey, pubkey_len, hex);
+		puthexdump(hex, pubkey, pubkey_len);
+	}
 #endif
 
 	struct wg_peer *wgp = wg_alloc_peer(wg);
@@ -4201,11 +4703,14 @@ wg_handle_prop_peer(struct wg_softc *wg,
 	}
 	memcpy(wgsatoss(wgsa), addr, addr_len);
 	switch (wgsa_family(wgsa)) {
+#ifdef INET
 	case AF_INET:
+		break;
+#endif
 #ifdef INET6
 	case AF_INET6:
-#endif
 		break;
+#endif
 	default:
 		error = EPFNOSUPPORT;
 		goto out;
@@ -4244,6 +4749,7 @@ skip_endpoint:
 			continue;
 
 		switch (wga->wga_family) {
+#ifdef INET
 		case AF_INET: {
 			struct sockaddr_in sin;
 			char addrstr[128];
@@ -4270,6 +4776,7 @@ skip_endpoint:
 
 			break;
 		    }
+#endif
 #ifdef INET6
 		case AF_INET6: {
 			struct sockaddr_in6 sin6;
@@ -4323,7 +4830,7 @@ wg_alloc_prop_buf(char **_buf, struct if
 	int error;
 	char *buf;
 
-	WG_DLOG("buf=%p, len=%lu\n", ifd->ifd_data, ifd->ifd_len);
+	WG_DLOG("buf=%p, len=%zu\n", ifd->ifd_data, ifd->ifd_len);
 	if (ifd->ifd_len >= WG_MAX_PROPLEN)
 		return E2BIG;
 	buf = kmem_alloc(ifd->ifd_len + 1, KM_SLEEP);
@@ -4332,9 +4839,10 @@ wg_alloc_prop_buf(char **_buf, struct if
 		return error;
 	buf[ifd->ifd_len] = '\0';
 #ifdef WG_DEBUG_DUMP
-	log(LOG_DEBUG, "%.*s\n",
-	    (int)MIN(INT_MAX, ifd->ifd_len),
-	    (const char *)buf);
+	if (wg_debug & WG_DEBUG_FLAGS_DUMP) {
+		log(LOG_DEBUG, "%.*s\n", (int)MIN(INT_MAX, ifd->ifd_len),
+		    (const char *)buf);
+	}
 #endif
 	*_buf = buf;
 	return 0;
@@ -4360,12 +4868,12 @@ wg_ioctl_set_private_key(struct wg_softc
 		&privkey, &privkey_len))
 		goto out;
 #ifdef WG_DEBUG_DUMP
-    {
-	char *hex = gethexdump(privkey, privkey_len);
-	log(LOG_DEBUG, "privkey=%p, privkey_len=%lu\n%s\n",
-	    privkey, privkey_len, hex);
-	puthexdump(hex, privkey, privkey_len);
-    }
+	if (wg_debug & WG_DEBUG_FLAGS_DUMP) {
+		char *hex = gethexdump(privkey, privkey_len);
+		log(LOG_DEBUG, "privkey=%p, privkey_len=%zu\n%s\n",
+		    privkey, privkey_len, hex);
+		puthexdump(hex, privkey, privkey_len);
+	}
 #endif
 	if (privkey_len != WG_STATIC_KEY_LEN)
 		goto out;
@@ -4601,12 +5109,14 @@ wg_ioctl_get(struct wg_softc *wg, struct
 				goto _next;
 
 			switch (wga->wga_family) {
+#ifdef INET
 			case AF_INET:
 				if (!prop_dictionary_set_data(prop_allowedip,
 					"ip", &wga->wga_addr4,
 					sizeof(wga->wga_addr4)))
 					goto _next;
 				break;
+#endif
 #ifdef INET6
 			case AF_INET6:
 				if (!prop_dictionary_set_data(prop_allowedip,
@@ -4616,7 +5126,7 @@ wg_ioctl_get(struct wg_softc *wg, struct
 				break;
 #endif
 			default:
-				break;
+				panic("invalid af=%d", wga->wga_family);
 			}
 			prop_array_set(allowedips, j, prop_allowedip);
 		_next:
@@ -4683,8 +5193,10 @@ wg_ioctl(struct ifnet *ifp, u_long cmd, 
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		switch (ifr->ifr_addr.sa_family) {
+#ifdef INET
 		case AF_INET:	/* IP supports Multicast */
 			break;
+#endif
 #ifdef INET6
 		case AF_INET6:	/* IP6 supports Multicast */
 			break;
@@ -4743,9 +5255,10 @@ wg_ioctl(struct ifnet *ifp, u_long cmd, 
 #ifdef WG_RUMPKERNEL
 	case SIOCSLINKSTR:
 		error = wg_ioctl_linkstr(wg, ifd);
-		if (error == 0)
-			wg->wg_ops = &wg_ops_rumpuser;
-		return error;
+		if (error)
+			return error;
+		wg->wg_ops = &wg_ops_rumpuser;
+		return 0;
 #endif
 	default:
 		break;
@@ -4763,6 +5276,7 @@ wg_ioctl(struct ifnet *ifp, u_long cmd, 
 	 *     will be handled via pr_ioctl form doifioctl later.
 	 */
 	switch (cmd) {
+#ifdef INET
 	case SIOCAIFADDR:
 	case SIOCDIFADDR: {
 		struct in_aliasreq _ifra = *(const struct in_aliasreq *)data;
@@ -4775,6 +5289,7 @@ wg_ioctl(struct ifnet *ifp, u_long cmd, 
 			error = ENOTTY;
 		break;
 	}
+#endif
 #ifdef INET6
 	case SIOCAIFADDR_IN6:
 	case SIOCDIFADDR_IN6: {
@@ -4789,6 +5304,8 @@ wg_ioctl(struct ifnet *ifp, u_long cmd, 
 		break;
 	}
 #endif
+	default:
+		break;
 	}
 #endif /* WG_RUMPKERNEL */
 
@@ -4879,6 +5396,11 @@ SYSCTL_SETUP(sysctl_net_wg_setup, "sysct
 	    CTLTYPE_BOOL, "force_underload",
 	    SYSCTL_DESCR("force to detemine under load"),
 	    NULL, 0, &wg_force_underload, 0, CTL_CREATE, CTL_EOL);
+	sysctl_createv(clog, 0, &node, NULL,
+	    CTLFLAG_PERMANENT|CTLFLAG_READWRITE,
+	    CTLTYPE_INT, "debug",
+	    SYSCTL_DESCR("set debug flags 1=log 2=trace 4=dump 8=packet"),
+	    NULL, 0, &wg_debug, 0, CTL_CREATE, CTL_EOL);
 }
 #endif
 
@@ -4963,20 +5485,31 @@ wg_input_user(struct ifnet *ifp, struct 
 
 	WG_TRACE("");
 
-	if (af == AF_INET) {
+	switch (af) {
+#ifdef INET
+	case AF_INET: {
 		struct sockaddr_in *sin = (struct sockaddr_in *)&ss;
 		struct ip *ip;
 
 		KASSERT(m->m_len >= sizeof(struct ip));
 		ip = mtod(m, struct ip *);
 		sockaddr_in_init(sin, &ip->ip_dst, 0);
-	} else {
+		break;
+	}
+#endif
+#ifdef INET6
+	case AF_INET6: {
 		struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&ss;
 		struct ip6_hdr *ip6;
 
 		KASSERT(m->m_len >= sizeof(struct ip6_hdr));
 		ip6 = mtod(m, struct ip6_hdr *);
 		sockaddr_in6_init(sin6, &ip6->ip6_dst, 0, 0, 0);
+		break;
+	}
+#endif
+	default:
+		goto out;
 	}
 
 	iov[0].iov_base = &ss;
@@ -4989,7 +5522,7 @@ wg_input_user(struct ifnet *ifp, struct 
 	/* Send decrypted packets to users via a tun. */
 	rumpuser_wg_send_user(wg->wg_user, iov, 2);
 
-	m_freem(m);
+out:	m_freem(m);
 }
 
 static int
@@ -5002,9 +5535,11 @@ wg_bind_port_user(struct wg_softc *wg, c
 		return 0;
 
 	error = rumpuser_wg_sock_bind(wg->wg_user, port);
-	if (error == 0)
-		wg->wg_listen_port = port;
-	return error;
+	if (error)
+		return error;
+
+	wg->wg_listen_port = port;
+	return 0;
 }
 
 /*
@@ -5016,6 +5551,7 @@ rumpkern_wg_recv_user(struct wg_softc *w
 	struct ifnet *ifp = &wg->wg_if;
 	struct mbuf *m;
 	const struct sockaddr *dst;
+	int error;
 
 	WG_TRACE("");
 
@@ -5027,10 +5563,12 @@ rumpkern_wg_recv_user(struct wg_softc *w
 	m->m_len = m->m_pkthdr.len = 0;
 	m_copyback(m, 0, iov[1].iov_len, iov[1].iov_base);
 
-	WG_DLOG("iov_len=%lu\n", iov[1].iov_len);
+	WG_DLOG("iov_len=%zu\n", iov[1].iov_len);
 	WG_DUMP_BUF(iov[1].iov_base, iov[1].iov_len);
 
-	(void)wg_output(ifp, m, dst, NULL);
+	error = wg_output(ifp, m, dst, NULL); /* consumes m */
+	if (error)
+		WG_DLOG("wg_output failed, error=%d\n", error);
 }
 
 /*
@@ -5053,7 +5591,7 @@ rumpkern_wg_recv_peer(struct wg_softc *w
 	m->m_len = m->m_pkthdr.len = 0;
 	m_copyback(m, 0, iov[1].iov_len, iov[1].iov_base);
 
-	WG_DLOG("iov_len=%lu\n", iov[1].iov_len);
+	WG_DLOG("iov_len=%zu\n", iov[1].iov_len);
 	WG_DUMP_BUF(iov[1].iov_base, iov[1].iov_len);
 
 	bound = curlwp_bind();

Index: src/tests/net/if_wg/t_basic.sh
diff -u src/tests/net/if_wg/t_basic.sh:1.4 src/tests/net/if_wg/t_basic.sh:1.4.6.1
--- src/tests/net/if_wg/t_basic.sh:1.4	Tue Mar  2 07:16:24 2021
+++ src/tests/net/if_wg/t_basic.sh	Wed Oct  9 11:15:39 2024
@@ -1,4 +1,4 @@
-#	$NetBSD: t_basic.sh,v 1.4 2021/03/02 07:16:24 simonb Exp $
+#	$NetBSD: t_basic.sh,v 1.4.6.1 2024/10/09 11:15:39 martin Exp $
 #
 # Copyright (c) 2018 Ryota Ozaki <ozaki.ry...@gmail.com>
 # All rights reserved.
@@ -48,6 +48,21 @@ check_ping_payload()
 	done
 }
 
+check_badudp()
+{
+	local proto=$1
+	local ip=$2
+	local port=51820        # XXX parametrize more clearly
+
+	if [ $proto = inet ]; then
+		atf_check -o ignore -e ignore \
+		    $HIJACKING nc -4uv -w1 $ip $port </dev/null
+	else
+		atf_check -o ignore -e ignore \
+		    $HIJACKING nc -6uv -w1 $ip $port </dev/null
+	fi
+}
+
 test_common()
 {
 	local type=$1
@@ -107,6 +122,9 @@ test_common()
 	elif [ $type = payload ]; then
 		export RUMP_SERVER=$SOCK_LOCAL
 		check_ping_payload $inner_proto $ip_wg_peer
+	elif [ $type = badudp ]; then
+		export RUMP_SERVER=$SOCK_LOCAL
+		check_badudp $outer_proto $ip_peer
 	fi
 
 	destroy_wg_interfaces
@@ -282,6 +300,36 @@ add_payload_sizes_test()
 	atf_add_test_case ${name}
 }
 
+add_badudp_test()
+{
+	local inner=$1
+	local outer=$2
+	local ipv4=inet
+	local ipv6=inet6
+
+	name="wg_badudp_${inner}_over_${outer}"
+	fulldesc="Test wg(4) with ${inner} over ${outer} with bad UDP packets"
+
+	eval inner=\$$inner
+	eval outer=\$$outer
+
+	atf_test_case ${name} cleanup
+	eval "
+		${name}_head() {
+			atf_set descr \"${fulldesc}\"
+			atf_set require.progs rump_server wgconfig wg-keygen nc
+		}
+		${name}_body() {
+			test_common badudp $outer $inner
+			rump_server_destroy_ifaces
+		}
+		${name}_cleanup() {
+			\$DEBUG && dump
+			cleanup
+		}"
+	atf_add_test_case ${name}
+}
+
 atf_test_case wg_multiple_interfaces cleanup
 wg_multiple_interfaces_head()
 {
@@ -459,6 +507,11 @@ wg_multiple_peers_cleanup()
 atf_init_test_cases()
 {
 
+	add_badudp_test ipv4 ipv4
+	add_badudp_test ipv4 ipv6
+	add_badudp_test ipv6 ipv4
+	add_badudp_test ipv6 ipv6
+
 	add_basic_test ipv4 ipv4
 	add_basic_test ipv4 ipv6
 	add_basic_test ipv6 ipv4

Index: src/tests/net/if_wg/t_misc.sh
diff -u src/tests/net/if_wg/t_misc.sh:1.12 src/tests/net/if_wg/t_misc.sh:1.12.2.1
--- src/tests/net/if_wg/t_misc.sh:1.12	Mon Jun 13 07:59:15 2022
+++ src/tests/net/if_wg/t_misc.sh	Wed Oct  9 11:15:39 2024
@@ -1,4 +1,4 @@
-#	$NetBSD: t_misc.sh,v 1.12 2022/06/13 07:59:15 martin Exp $
+#	$NetBSD: t_misc.sh,v 1.12.2.1 2024/10/09 11:15:39 martin Exp $
 #
 # Copyright (c) 2018 Ryota Ozaki <ozaki.ry...@gmail.com>
 # All rights reserved.
@@ -55,9 +55,13 @@ wg_rekey_body()
 	export RUMP_SERVER=$SOCK_LOCAL
 	atf_check -s exit:0 -o ignore \
 	    rump.sysctl -w net.wg.rekey_after_time=$rekey_after_time
+	$DEBUG && atf_check -s exit:0 -o ignore \
+	    rump.sysctl -w net.wg.debug=-1
 	export RUMP_SERVER=$SOCK_PEER
 	atf_check -s exit:0 -o ignore \
 	    rump.sysctl -w net.wg.rekey_after_time=$rekey_after_time
+	$DEBUG && atf_check -s exit:0 -o ignore \
+	    rump.sysctl -w net.wg.debug=-1
 
 	# It sets key_priv_local key_pub_local key_priv_peer key_pub_peer
 	generate_keys
@@ -76,18 +80,18 @@ wg_rekey_body()
 
 	export RUMP_SERVER=$SOCK_LOCAL
 
+	echo ping1time=$(date)
 	$ping $ip_wg_peer
 
 	latest_handshake=$($HIJACKING wgconfig wg0 show peer peer0 \
 	    | awk -F ': ' '/latest-handshake/ {print $2;}')
-	$DEBUG && echo $latest_handshake
+	$DEBUG && echo handshake1=$latest_handshake
 
 	sleep 1
 
+	echo ping2time=$(date)
 	$ping $ip_wg_peer
 
-	atf_expect_fail "PR kern/56252"
-
 	# No reinitiation is performed
 	atf_check -s exit:0 -o match:"$latest_handshake" \
 	    $HIJACKING wgconfig wg0 show peer peer0
@@ -95,6 +99,7 @@ wg_rekey_body()
 	# Wait for a reinitiation to be performed
 	sleep $rekey_after_time
 
+	echo ping3time=$(date)
 	$ping $ip_wg_peer
 
 	# A reinitiation should be performed
@@ -103,11 +108,12 @@ wg_rekey_body()
 
 	latest_handshake=$($HIJACKING wgconfig wg0 show peer peer0 \
 	    | awk -F ': ' '/latest-handshake/ {print $2;}')
-	$DEBUG && echo $latest_handshake
+	$DEBUG && echo handshake2=$latest_handshake
 
 	# Wait for a reinitiation to be performed again
 	sleep $((rekey_after_time+1))
 
+	echo ping4time=$(date)
 	$ping $ip_wg_peer
 
 	# A reinitiation should be performed
@@ -115,8 +121,6 @@ wg_rekey_body()
 	    $HIJACKING wgconfig wg0 show peer peer0
 
 	destroy_wg_interfaces
-
-	atf_fail "failed to trigger PR kern/56252"
 }
 
 wg_rekey_cleanup()
@@ -143,10 +147,9 @@ wg_handshake_timeout_body()
 	local ip_wg_local=10.0.0.1
 	local ip_wg_peer=10.0.0.2
 	local port=51820
-	local rekey_after_time=3
 	local outfile=./out
-	local rekey_timeout=3
-	local rekey_attempt_time=8
+	local rekey_timeout=4
+	local rekey_attempt_time=10
 	local n=
 
 	setup_servers
@@ -198,8 +201,6 @@ wg_handshake_timeout_body()
 
 	n=$(grep "$ip_local.$port > $ip_peer.$port" $outfile |wc -l)
 
-	atf_expect_fail "PR kern/56252"
-
 	# Give up handshaking after three attempts
 	atf_check_equal $n 3
 
@@ -208,8 +209,6 @@ wg_handshake_timeout_body()
 	export RUMP_SERVER=$SOCK_LOCAL
 
 	destroy_wg_interfaces
-
-	atf_fail "failed to trigger PR kern/56252"
 }
 
 wg_handshake_timeout_cleanup()
@@ -271,8 +270,6 @@ wg_cookie_body()
 	# and a session doesn't start
 	$ping_fail $ip_wg_peer
 
-	atf_expect_fail "PR kern/56252"
-
 	extract_new_packets $BUS > $outfile
 	$DEBUG && cat $outfile
 	# XXX length 64 indicates the message is a cookie message
@@ -296,8 +293,6 @@ wg_cookie_body()
 	    $HIJACKING wgconfig wg0
 
 	destroy_wg_interfaces
-
-	atf_fail "failed to trigger PR kern/56252"
 }
 
 wg_cookie_cleanup()
@@ -354,8 +349,6 @@ wg_mobility_body()
 	export RUMP_SERVER=$SOCK_LOCAL
 	$ping_fail $ip_wg_peer
 
-	atf_expect_fail "PR kern/56252"
-
 	extract_new_packets $BUS > $outfile
 	$DEBUG && cat $outfile
 
@@ -394,8 +387,6 @@ wg_mobility_body()
 	atf_check -s exit:0 -o not-match:"$ip_local.$port > $ip_peer.$port" cat $outfile
 
 	destroy_wg_interfaces
-
-	atf_fail "failed to trigger PR kern/56252"
 }
 
 wg_mobility_cleanup()

Reply via email to