Am 04.06.19 um 14:25 schrieb Simon Beirnaert: > Sorry for the late reply here. Got caught up in other stuff. I did > some digging around. Set the loglevel with systemd-analyze to debug > and also added the debug flag to pam_systemd.so. > > What I noticed is that on boot, when 5000+ machines try to > authenticate at once, pam_systemd seems to fall on its ass and fail > due to resource exhaustion. An excerpt from journalctl is attached > below. > > I thought this might've been the reason behind sshd processes not > being assigned to the correct slice, but the processes for which > these log entries are generated are not available on the system > anymore, which I take as meaning that the sshd process exited > because it couldn't open a session. > > I tried to go about it the other way around and search for logs > generated by any sshd process which is under the system.slice. I > used this oneliner to do so: > > for i in $(systemctl status ssh | grep 'sshd: <user>' | sed -E > 's/[^0-9]*([0-9]*)[^0-9]*/\1/'); do echo "==> logs for process $i"; > journalctl | grep '\[$i\]'; done | less > > The search came up empty. For none of the currently 7000+ sshd > processes which aren't in the correct user slice, there are any logs > in journalctl. I verified and all logs since boot time are currently > still in the journal, it hasn't rotated yet. >
If you have 5000 users authenticate at once, I can imagine, that libpam-systemd/systemd/dbus run into some limits. Would you mind testing with v241 (either from backports or by setting up a buster system) and if the problem is still reproducible, file it upstream at https://github.com/systemd/systemd/issues Thanks, Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth?
signature.asc
Description: OpenPGP digital signature