Hi Stuart, thanks for your prompt response.

> There isn't a notion of "server" and "client" in the protocol, just
> different endpoints.

Agree. I may have used the wrong nomenclature. Wireguard does make a
distinction between the "initiator" and "responder". In this case, I
have used the term client to mean initiator and server to mean responder.
Apologies for the confusion.

An important distinction in this case is the responder (OpenBSD box) is
missing an "Endpoint=" directive in the [Peer] section of the .conf since
the initiator is meant to be mobile. If the other endpoint is mobile and
roams between networks, it de facto must be the initiator.

> Wireguard dossn't timeout the remote side. Otherwise how would you get
> a connection to start up again after it was dropped? That's common
> for wkreguard, not specific to wg(4).
 
I would expect the responder to timeout since it has no endpoint declared,
no? If the initiator disappears, after a certain number of retries are
exhausted, the responder should give up, as all it has to rely on for the
endpoint's IP is its internal lookup table, which has to be assumed stale.
At that point I would expect it to go idle and wait for the initiator to
contact it again before resuming. This is especially true in the case of
mobile networks where the endpoint IP will be changing all the time. 

> I think this relates to retries for one handshake attempt. If there's
> traffic to send to an endpoint and there's no active handshake it will
> attempt a new handshake. If that handshake doesn't complete then that
> traffic will (after a while) get dropped but new traffic will attempt
> another handshake.

I thought I had a rebuttal to this, but after reading the Wireguard spec,
I'm not so sure anymore. It seems kind of vague. My hopes and dreams of
seeing state machine diagrams went unfulfilled. However, I think what you
are referring to lies in sections 6.4 and 6.5:

| This reinitiation is attempted for REKEY_ATTEMPT_TIME seconds before
| giving up, though this counter is reset when a peer explicitly attempts
| to send a new transport data message.

| we can determine if the secure session is broken or disconnected if a 
transport
| data message has not been received for (KEEPALIVE_TIMEOUT + REKEY_TIMEOUT)
| seconds, in which case a handshake initiation message is sent to the 
unresponsive
| peer, once every REKEY_TIMEOUT seconds, as in section 6.4, until a secure 
session
| is recreated successfully or until REKEY_ATTEMPT_TIME seconds have passed.

> Do you have continuing packets (or keepalives) to send over wg(4) to
> this endpoint? If you do then it seems to me that it's working as
> designed.

Yes - there are flows attempting to reach the initiator-side network routed
via wg(4) on the responder. My *expectation* was after the initiator dropped off
the mobile network and became unresponsive, a timer of REKEY_ATTEMPT_TIME 
expired
after which wg(4) would stop and return an ICMP Destination Unreachable to the
source, until such time the initiator re-established the connection.

However, the more I read this it seems like if a packet enters wg(4) expecting 
to go
over the tunnel in the time 0 < t < REKEY_ATTEMPT_TIME, the timer resets and the
process begins over again ad infinitum, as long as the packets keep coming into 
wg(4).

If this is the case, I believe this to be a flaw in the Wireguard protocol 
design that
can lead to a lot of nuisance traffic. The spec makes statements such as the 
following:

| the outer external source IP of an encrypted WireGuard packet is used to 
identify
| the remote endpoint of a peer, enabling peers to roam freely between different
| external IPs, between mobile networks for example

So this is not an unknown use case. However I believe it makes the assumption 
that
the mobile device would continue to roam for eternity, and never be switched 
off.

Regards
Lloyd

Reply via email to