Hi Stuart, thanks for your prompt response. > There isn't a notion of "server" and "client" in the protocol, just > different endpoints.
Agree. I may have used the wrong nomenclature. Wireguard does make a distinction between the "initiator" and "responder". In this case, I have used the term client to mean initiator and server to mean responder. Apologies for the confusion. An important distinction in this case is the responder (OpenBSD box) is missing an "Endpoint=" directive in the [Peer] section of the .conf since the initiator is meant to be mobile. If the other endpoint is mobile and roams between networks, it de facto must be the initiator. > Wireguard dossn't timeout the remote side. Otherwise how would you get > a connection to start up again after it was dropped? That's common > for wkreguard, not specific to wg(4). I would expect the responder to timeout since it has no endpoint declared, no? If the initiator disappears, after a certain number of retries are exhausted, the responder should give up, as all it has to rely on for the endpoint's IP is its internal lookup table, which has to be assumed stale. At that point I would expect it to go idle and wait for the initiator to contact it again before resuming. This is especially true in the case of mobile networks where the endpoint IP will be changing all the time. > I think this relates to retries for one handshake attempt. If there's > traffic to send to an endpoint and there's no active handshake it will > attempt a new handshake. If that handshake doesn't complete then that > traffic will (after a while) get dropped but new traffic will attempt > another handshake. I thought I had a rebuttal to this, but after reading the Wireguard spec, I'm not so sure anymore. It seems kind of vague. My hopes and dreams of seeing state machine diagrams went unfulfilled. However, I think what you are referring to lies in sections 6.4 and 6.5: | This reinitiation is attempted for REKEY_ATTEMPT_TIME seconds before | giving up, though this counter is reset when a peer explicitly attempts | to send a new transport data message. | we can determine if the secure session is broken or disconnected if a transport | data message has not been received for (KEEPALIVE_TIMEOUT + REKEY_TIMEOUT) | seconds, in which case a handshake initiation message is sent to the unresponsive | peer, once every REKEY_TIMEOUT seconds, as in section 6.4, until a secure session | is recreated successfully or until REKEY_ATTEMPT_TIME seconds have passed. > Do you have continuing packets (or keepalives) to send over wg(4) to > this endpoint? If you do then it seems to me that it's working as > designed. Yes - there are flows attempting to reach the initiator-side network routed via wg(4) on the responder. My *expectation* was after the initiator dropped off the mobile network and became unresponsive, a timer of REKEY_ATTEMPT_TIME expired after which wg(4) would stop and return an ICMP Destination Unreachable to the source, until such time the initiator re-established the connection. However, the more I read this it seems like if a packet enters wg(4) expecting to go over the tunnel in the time 0 < t < REKEY_ATTEMPT_TIME, the timer resets and the process begins over again ad infinitum, as long as the packets keep coming into wg(4). If this is the case, I believe this to be a flaw in the Wireguard protocol design that can lead to a lot of nuisance traffic. The spec makes statements such as the following: | the outer external source IP of an encrypted WireGuard packet is used to identify | the remote endpoint of a peer, enabling peers to roam freely between different | external IPs, between mobile networks for example So this is not an unknown use case. However I believe it makes the assumption that the mobile device would continue to roam for eternity, and never be switched off. Regards Lloyd