Hi,
Here's the summary of the IRC meeting.
---
COMMUNITY MEETING
Place: #openvpn-meeting on libera.chat
Date: Wed 9th March 2022
Time: 10:30 CET (9:30 UTC)
Planned meeting topics for this meeting were here:
<https://community.openvpn.net/openvpn/wiki/Topics-2022-03-09>
Your local meeting time is easy to check from services such as
<http://www.timeanddate.com/worldclock>
SUMMARY
cron2, dazo, d12fk, mattock, MaxF, novaflash, plaisthos participated in
this meeting.
---
Talked about the upcoming OpenSSL release (with security fixes) and
agreed that'd be a good time to do an OpenVPN 2.5 release as well.
---
Talked about the multi-defer auth loophole. We have patches that fix the
immediate problem, but "fix for good" needs a more massive rewrite of
the deferred auth handling.
Agreed to fix the loophole now and do the proper fix later.
--
Full chatlog attached
(11.39.44) cron2: ah
(11.40.10) cron2: so, let's start
(11.40.53) cron2: a new openssl release is coming next tuesday (3.0.2 and
1.1.1*) with a security thing. So that would be a good time to do the upcoming
2.5.x release
(11.41.38) mattock: +1
(11.41.39) dazo: +1 ... do we get the multi-defer auth stuff part of this?
(11.42.13) plaisthos: severity ofn the openssl release?
(11.42.43) cron2: dazo: we should - it's waiting for you
(11.44.08) cron2: there was a discussion on details and how to further improve
the plugin for automated testing, and then it basically ended... but we have
ACKs on the general patch ("it will fix the problem"), and we seem to agree
that further improvement could be done (like, remove defer/simple.c)
(11.44.22) dazo: cron2: I might be at loss ... ordex acked it ... you had some
concerns about the fatal behaviour, but that's what we agreed on was the right
solution. For testing, lets not test that now in automated testing until we've
done refinements
(11.45.09) dazo: kicking out defer/simple.c should not block this change.
That's just another patch removing it, right?
(11.46.32) cron2: I'd remove it right away (so git can see "it got moved and
then improved") but we can do this in two steps
(11.49.41) dazo: Okay, I see in the mail dialoge there were some complaints
about NULL==var vs var==NULL ... I can fix those, rename multi-auth.c to
defer.c and update testing doc? Is that a plan to move forward? I'll have
that done today
(11.49.58) cron2: anyway, the other thing (ending openvpn instead of just
returning AUTH FAILED) - I missed that part of the discussion, it seems. Why
end the process, instead of just fail auth (with a clear message)?
(11.50.09) cron2: the latter would enable automated testing
(11.50.25) cron2: which is something I do a lot, and it finds a lot of corner
cases :-)
(11.51.02) dazo: Basically, because the behaviour was broken ... and to really
lead admins into the right direction of trying to understand why auth failed.
They would have to look at the log files for openvpn far more closely ("Why
does openvpn stop running?")
(11.51.47) cron2: nah, not "rename multi-auth.c to defer.c" - the new name is
good, but if you remove the old source in the same go, git will understand "ah,
file has moved, and then was improved"
(11.51.52) dazo: and also to ensure we don't end up in other corner cases with
various states which could lead to another circumvention of auth
(11.51.58) plaisthos: dazo: that feels like a very weak argument
(11.52.19) cron2: I think "the server always refuses login" is about as clear
as "the server dies" - you need to look into the log
(11.52.21) plaisthos: the first part
(11.52.31) dazo: plaisthos: that was one of the arguments brought up, I don't
recall from whom ... but it was to force admins to look into what's going on
(11.55.21) cron2: if the server always refuses login, don't you think the admin
will look into it as well?
(11.56.10) dazo: many admins will start saying "have you reset your OTP tokens,
have done this have you done that" to their users .... and then they might
start looking into logs. that's my experience with a bit too many "admins"
(11.56.18) dazo: blame the user first
(11.57.34) dazo: And by stopping openvpn on an illconfigured authentication
setup, from a security point of view, is safer than just kicking rejects. It
really ensures the misconfiguration is getting fixed properly.
(11.59.27) dazo: the only argument of not stopping, is automated testing ...
which is understandable, but perhaps the test framework needs to be able to
handle crashes of the processes as well? Isn't that going to improve the
testing possibilities better?
(12.00.05) cron2: given that systemd will restart openvpn right away, I wonder
if that is really very useful - it will hide the error behind new startup
messages
(12.00.40) dazo: systemd will not restart under any circumstances, only some
.... but let me check what happens with our current systemd unit
(12.00.47) cron2: under normal conditions, openvpn on the server *never* exits,
so if it does, it hints at "plaistos messed up something" :-)
(12.01.02) cron2: so that would be a severe bug, not something I want to add to
regular testing
(12.07.48) d12fk: what about temporary auth failures?
(12.10.24) cron2: d12fk: not sure I understand your question
(12.13.54) d12fk: like when auth fails for other reasons than wrong credentials
e.g. short network outage, lost packet, etc
(12.14.03) d12fk: will the process die then as well?
(12.14.09) cron2: never
(12.14.20) cron2: the openvpn process *never* ends itself, except on out of
memory
(12.14.37) dazo: I've just done quick testing on the exit code when this
happens, which is 1 ... systemd unit file has Restart=on-failure, which will
trigger restart if exit code is not 0
(12.15.25) cron2: (some of the frame patches managed to introduce SIGSEGV "in
some corner cases", which crashed the server and was a very obvious "something
really bad is happening" signal)
(12.15.47) d12fk: cron2: so I have read this wrong then, that we'll start
exiting on auth failed?
(12.16.17) cron2: d12fk: with the proposed patch, we exit if the auth handler
notices that there are 2(+) active plugins that use deferred authentication
(12.16.38) cron2: we know that we can not handle this, so the quick "fix" is
"detect that situation, and <dosomething>"
(12.16.50) cron2: I think "dosomething" could be AUTH_FAIL + a good log message
(12.16.56) cron2: dazo's patch is "abort server!"
(12.17.35) d12fk: ah sorry, got it wrong then
(12.23.29) mattock: running out of time - how would we condense the above
discussion?
(12.23.53) d12fk: so, do we have a plan on how to fix this for good? if we do,
I think the discussion is moot, as there will be follow up soonish anyways
(12.24.34) dazo: we don't have a plan for "fix this for good" ... but we need
to close this loophole soonish.
(12.25.00) cron2: "fix for good" need a more massive rewrite of the deferred
auth handling
(12.25.02) dazo: a proper fix will require a more intrusive re-factoring of the
plugin.c code
(12.29.33) d12fk: hm, not sure what the current "fix" adds for ppl who run a
server, their fix needs to be, don't run 2+ auth plugins, but they had a reason
to have them in the first place
(12.30.21) cron2: 2+ auth plugins is fine
(12.30.34) cron2: 1 with deferred auth, 1+ with non-deferred is fine
(12.30.34) d12fk: not doing deferred auth is probably also no solution if one
has a sufficiently large user base
(12.31.25) cron2: 2+ deferred means "depending on timing, if one plugin returns
SUCCESS and the other returns FAIL, OpenVPN might let the user in, even though
it *should* do FAIL"
(12.32.02) d12fk: but yeah, as this is just a quick hack, I think we need to
work on the real solution instead of sugar coating this one
(12.32.21) cron2: +1
(12.32.37) dazo: yes, but we want to close this loophole asap first
(12.33.08) cron2: but now we want/need to make this clear "this is not a secure
setup, it has a realistic chance of letting in unauthenticated users"
(12.33.28) dazo: ^^^that
(12.33.41) d12fk: agreed, lets get the code in even if it has smells, then
revert and built the real thing
(12.34.26) mattock: +1
(12.34.34) dazo: the thing is .... deferred auth was not properly designed for
use-cases with multiple auth plugins to start with
(12.36.28) d12fk: hence the dending re-design
(12.36.35) d12fk: *pending
(12.36.43) mattock: anything else?
(12.37.10) cron2: we still have no formal agreement on "end process" or "log
warning, and return AUTH_FAIL"
(12.37.57) d12fk: my take is: it doesn't matter
(12.37.59) novaflash: why not take a vote
(12.38.16) cron2: d12fk: it does, as it makes automated testing much harder for
me
(12.39.20) novaflash: then i guess the method that doesn't make it harder for
you is the logical thing to do?
(12.40.28) d12fk: there are options: 1) don't test multiple deferred auth while
it's worked around 2) don't exit, but retry instead
(12.40.56) d12fk: either one works for me, as it's only temporary
(12.41.05) cron2: d12fk: of course I can do "do not test things", that will
make my life *much* easier
(12.41.26) d12fk: not *all the things* =)
(12.41.37) cron2: that is a nice suggestion actually :-)
(12.41.40) dazo: d12fk++
(12.42.20) cron2: but really - the point is, dazo wrote a plugin to enable
testing this. So it makes sense to test the case, to ensure we won't regress.
But if the server dies, it's much more cumbersome (as in "manual work") to test
(12.43.10) dazo: lets fix that later on ... before this new plug-in we didn't
test this case, so it's not a regression
(12.44.16) d12fk: what timeframe are we aiming at for the final fix to this
issue?
(12.45.11) cron2: "we did not test this before" is not a very good argument
(12.45.19) cron2: but anyway, I give up
(12.45.23) cron2: too much time wasted on this already
(12.45.39) cron2: I maintain my stance "the server must never exit", but if
dazo insists, I will just not test it
(12.49.27) d12fk: so, to put it another way: who can start to look into a fix
that fixes the issue?
(12.51.14) cron2: plaisthos understands the whole "multiple authentication
paths, some of them deferred, some not" best
(12.51.26) cron2: there's, I think, an easy way, and a hard way
(12.51.37) cron2: the easy way would be to sequentialize plugins
(12.51.53) cron2: so, if you have multiple plugins, and one of them defers, do
not progress plugins until that one is done
(12.52.08) cron2: (so, 3 deferred plugins would run sequentially, each of them
deferred)
(12.52.25) cron2: the hard way would be "make plugins fully independent, run
multiple deferred plugins in parallel"
(12.53.00) cron2: (we do parallel deferred for plugin/script/management today,
but keep track with 3 sets of state variable/files, and not "one per plugin")
(12.53.13) d12fk: sounds like the hard way is the right way [tm]
(12.53.35) plaisthos: yeah
(12.53.43) ***cron2 shrugs
(12.53.53) plaisthos: basically the code is really suited currently to handle
multiple deferred plugins
(12.54.05) cron2: "not"
(12.54.11) plaisthos: yes
(12.54.15) cron2: :-)
(12.54.22) plaisthos: I am not going to going down the rabbit hole of
implementing that
(12.54.53) plaisthos: that would require major refactoring and should then also
make script/management just use the same mechanism
(12.54.55) cron2: this is why "sequentialize plugins" should be fairly easy
(one auth control file, state variable, etc) while "all in parallel" means
"change all that auth stuff to be fully table driven"
(12.55.00) cron2: yes
(12.55.21) plaisthos: even the sequentialise plugins is harder than it
initially sounds
(12.55.27) cron2: "should", "fairly" and "easy" in openvpn terms :-)
(12.56.33) cron2: damn, I have no time... but you nerdsniped me... I'll have a
look ;-)
(12.56.55) cron2: mattock: can we do 2.5.x release next wednesday?
(12.56.56) d12fk: heh
(12.57.00) cron2: and 2.4.x release?
(13.01.22) mattock: that should be doable
(13.02.13) plaisthos: I am through the dns patch and have some comments I will
send to the mailing list in a few minutes
(13.02.33) d12fk: cool
_______________________________________________
Openvpn-devel mailing list
Openvpn-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openvpn-devel