Hello Gert,

On Wed, 11 Aug 2021 at 18:31, Gert Doering <g...@greenie.muc.de> wrote:
>
> Hi Lukas,
>
> small world :-)

Yes, indeed :P


> On Wed, Aug 11, 2021 at 04:06:38PM +0200, Lukas Tribus wrote:
> > in an environment with 2 openvpn instances (one TCP and one UDP),
> > previous unattended-upgrades of the openvpn package in Ubuntu 18.04
> > caused outages, because the UDP based instance was unable to add
> > routes to the kernel:
> >
> > /sbin/ip route add 192.168.20.0/24 via 10.20.30.2
> > ERROR: Linux route add command failed: external program exited with
> > error status: 2
> > /sbin/ip route add 10.20.30.0/24 via 10.20.30.2
> > ERROR: Linux route add command failed: external program exited with
> > error status: 2
>
> As Antonio says, it would be good to have a bit more log output
> (with --verb 3 or 4) here.
>
> What does "ip route" show before/after starting the UDP instance?

It's not yet reproducible, it happened only twice in 6 months.

UDP instance covers a handful of remote clients, each with their own
private /24 subnet, and the TCP instance covers a remote client, with
another private /24 subnet.


This happened during an unattended-upgrade from openvpn
2.4.4-2ubuntu1.5 to 2.4.4-2ubuntu1.6 on Ubuntu 18.04:

TCP instance, old PID 947, new PID 8162->8167
UDP instance, old PID 1024, new PID 8187->8190

(I guess openvpn forks once at startup)

root@prodhost-1:/var/log# grep openvpn syslog | grep "Aug  5 06:22" |
grep -v Learn
Aug  5 06:22:38 prodhost-1 unattended-upgrade: Packages that will be
upgraded: linux-headers-generic linux-headers-virtual
linux-image-extra-virtual linux-image-generic linux-image-virtual
linux-libc-dev linux-virtual openvpn
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[947]: /sbin/ip route
del 192.168.22.0/24
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[947]: openvpn_execve:
unable to fork: Resource temporarily unavailable (errno=11)
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[947]: Exiting due to
fatal error
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8162]: OpenVPN 2.4.4
x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11]
[MH/PKTINFO] [AEAD] built on Jul 19 2021
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8162]: library
versions: OpenSSL 1.1.1  11 Sep 2018, LZO 2.08
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: Diffie-Hellman
initialized with 2048 bit key
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: ROUTE_GATEWAY
192.168.232.1
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: TUN/TAP device
tun0 opened
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: TUN/TAP TX
queue length set to 100
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: do_ifconfig,
tt->did_ifconfig_ipv6_setup=0
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip link
set dev tun0 up mtu 1500
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[1024]: event_wait :
Interrupted system call (code=4)
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[1024]: /sbin/ip route
del 192.168.20.0/24
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[1024]:
openvpn_execve: unable to fork: Resource temporarily unavailable
(errno=11)
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[1024]: Exiting due to
fatal error
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip addr
add dev tun0 local 10.20.60.1 peer 10.20.60.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip route
add 192.168.22.0/24 via 10.20.60.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip route
add 192.168.23.0/24 via 10.20.60.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip route
add 192.168.24.0/24 via 10.20.60.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip route
add 192.168.25.0/24 via 10.20.60.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip route
add 192.168.26.0/24 via 10.20.60.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8187]: OpenVPN 2.4.4
x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11]
[MH/PKTINFO] [AEAD] built on Jul 19 2021
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8187]: library
versions: OpenSSL 1.1.1  11 Sep 2018, LZO 2.08
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: Diffie-Hellman
initialized with 2048 bit key
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: Deprecated TLS
cipher name 'AES256-SHA', please use IANA name
'TLS-RSA-WITH-AES-256-CBC-SHA'
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip route
add 192.168.27.0/24 via 10.20.60.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: ROUTE_GATEWAY
192.168.232.1
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: TUN/TAP device
tun2 opened
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: TUN/TAP TX
queue length set to 100
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: do_ifconfig,
tt->did_ifconfig_ipv6_setup=0
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: /sbin/ip link
set dev tun2 up mtu 1500
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: /sbin/ip addr
add dev tun2 local 10.20.30.1 peer 10.20.30.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip route
add 192.168.28.0/24 via 10.20.60.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip route
add 192.168.29.0/24 via 10.20.60.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: /sbin/ip route
add 192.168.20.0/24 via 10.20.30.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: ERROR: Linux
route add command failed: external program exited with error status: 2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: /sbin/ip route
add 10.20.30.0/24 via 10.20.30.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: ERROR: Linux
route add command failed: external program exited with error status: 2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip route
add 10.34.194.0/24 via 10.20.60.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: Could not
determine IPv4/IPv6 protocol. Using AF_INET
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: Socket
Buffers: R=[212992->212992] S=[212992->212992]
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: UDPv4 link
local (bound): [AF_INET][undef]:1194
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: UDPv4 link
remote: [AF_UNSPEC]
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: GID set to nogroup
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: UID set to nobody
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: MULTI:
multi_init called, r=256 v=256
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: IFCONFIG POOL:
base=10.20.30.4 size=62, ipv6=0
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: IFCONFIG POOL LIST
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_ubiquiti[8190]: Initialization
Sequence Completed
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip route
add 192.168.51.0/24 via 10.20.60.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: /sbin/ip route
add 10.20.60.0/24 via 10.20.60.2
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: Could not
determine IPv4/IPv6 protocol. Using AF_INET
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: Socket
Buffers: R=[131072->131072] S=[16384->16384]
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: Listening for
incoming TCP connection on [AF_INET][undef]:1195
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: TCPv4_SERVER
link local (bound): [AF_INET][undef]:1195
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: TCPv4_SERVER
link remote: [AF_UNSPEC]
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: GID set to nogroup
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: UID set to nobody
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: MULTI:
multi_init called, r=256 v=256
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: IFCONFIG POOL:
base=10.20.60.4 size=62, ipv6=0
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: IFCONFIG POOL LIST
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: MULTI: TCP
INIT maxclients=1024 maxevents=1028
Aug  5 06:22:47 prodhost-1 ovpn-openvpn_mikrotik[8167]: Initialization
Sequence Completed

As you can see, both the UDP instance as well as the TCP instance try
to insert routes at the same moment, the TCP instance is slightly
younger (a few ms maybe), and the route insertion fails.

I think it would be best to run all the ip commands through "strace
-ttAo instancename.log" to have an idea about the syscalls between
iproute2 and kernel. Would this be possible with the "--iproute cmd"
configuration, how would that configuration look like exactly? I mean
how can I refer to subnets and gateways within the --iproute cmd
config?



> > iproute2 return code 2 indicates a kernel error.
> >
> > The openvpn instances downgrade privileges to nogroup/nobody, so the
> > removal of the routes also fail, but when removing the tun interface,
> > the routes will vanish anyway.
>
> Permission errors should be correctly logged as such, and downgrading
> should only happen after ifconfig/route setup.

Yes I can confirm. This causes some expected errors in the log, but I
don't see any reason why this would have anything to do with the
issue.



> > I'm wondering if somebody has seen issues like this.  Of course the
> > error comes from the kernel, this could be some race condition due to
> > two processes inserting routes at the same moment or something, but I
> > have to find a way to do this reliably.
>
> I've never heard about race conditions of this sort.
>
> Are you trying to insert the *same* /24?  Or does each instance have
> their own subnets?

The 2 instances use distinct subnet's.


> > I'm also wondering about error handling. Failing to add routes means
> > we have a non-working openvpn instance, but the ip route return code
> > is only logged; it does not trigger a fatal exit of openvpn itself. So
> > the parent process/process supervisor cannot possibly be aware of any
> > problems.
>
> This is intentional, to some extent, to gracefully handle client side
> issues ("the server pushes 192.168.1.0/24, but this is used as client
> side LAN, so cannot be installed").
>
> On the server side, maybe we should fail hard in this case... but it's
> the same code.  And the code is 15+ years old, so nobody around today
> really knows for sure why certain design decisions have been made back
> then...
>
> (I inherited that mess, I just wanted to add IPv6 support :-) )

Ah, it's all clear to me now ;)


thanks,
lukas


_______________________________________________
Openvpn-users mailing list
Openvpn-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openvpn-users

Reply via email to