> On Aug 28, 2023, at 20:04, Eric Wong <e...@80x24.org> wrote: > > >> >> Synopsis: RLIMIT_CPU doesn't work reliably on mostly idle systems >> Category: system >> Environment: > System : OpenBSD 7.3 > Details : OpenBSD 7.3 (GENERIC.MP) #1242: Sat Mar 25 18:04:31 MDT 2023 > > dera...@octeon.openbsd.org:/usr/src/sys/arch/octeon/compile/GENERIC.MP > > Architecture: OpenBSD.octeon > Machine : octeon >> Description: > > RLIMIT_CPU doesn't work reliably when few/no syscalls are made on an > otherwise idle system (aside from the test process using up CPU). > It can take 20-50s to fire SIGKILL with rlim_max=9 (and the SIGXCPU > from rlim_cur=1 won't even fire). > > I can reproduce this on a private amd64 VM and also on gcc231 > on GCC compiler farm <https://cfarm.tetaneutral.net/>. > I can't reproduce this on a busy system like gcc220 on cfarm, > however. > >> How-To-Repeat: > > Following is a standalone C program which demonstrates the problem on > a mostly idle system: > /* > * Most reliably reproduced with compiler optimizations disabled: > * > * cc -o rlimit_cpu -ggdb3 -Wall rlimit_cpu.c > * > * Neither SIGXCPU (from rlim_cur) nor SIGKILL (from rlim_max) > * with RLIMIT_CPU set seem to fire reliably with few syscalls being made. > * On an otherwise idle system, it can take many more seconds (20-50s) > * than expected when rlim_max=9 (SIGXCPU doesn't even happen). > * Best case is 2 seconds for SIGXCPU when rlim_cur=1 on a busy system > * which is understandable due to kernel accounting delays. > * > * I rely on RLIMIT_CPU to protect systems from pathological userspace > * code (diff generation, regexps, etc) > * > * Testing on cfarm <https://cfarm.tetaneutral.net/> machines, > * the issue is visible on a mostly-idle 4-core gcc231 mips64 > * but doesn't seem to happen on the busy 12-core gcc220 machine > * (only 2 seconds for XCPU w/ rlim_cur=1) > */ > #include <sys/resource.h> > #include <assert.h> > #include <signal.h> > #include <unistd.h> > > static void sigxcpu(int sig) > { > write(1, "SIGXCPU\n", 8); > _exit(1); > } > > static volatile size_t nr; // volatile to disable optimizations > int main(void) > { > struct rlimit rlim = { .rlim_cur = 1, .rlim_max = 9 }; > int rc; > > signal(SIGXCPU, sigxcpu); > rc = setrlimit(RLIMIT_CPU, &rlim); > assert(rc == 0); > > /* > * adding some time, times, and write syscalls improve likelyhood of > * of rlimit signals firing in a timely manner. writes to /dev/null > * seems less-likely to trigger than writes to the terminal or > * regular file. > */ > for (;; nr++) { > } > > return 0; > } > >> Fix: > Making more syscalls can workaround the problem, but that's not > an option when dealing with userspace-heavy code like pathological regexps.
The CPU time limit is checked from a periodic timeout. CPU time totals accumulate in mi_switch(). The problem is that on a mostly idle system, a user thread that is hogging the CPU may take a very long time to switch out, and the and all that CPU time doesn't accumulate until the switch, and so the signal will arrive later than requested. System calls have points where a thread can switch out, which is why the delay is exaggerated on synthetic workloads like the busy-loop shown above. One possible solution is to check usage times for threads that are still ONPROC during the rusage timeout. Another approach is to be more aggressive about forcing threads to switch out, even when nothing else wants to run. Coincidentally, we are discussing p_rtime on tech@ right now, which is tangentially related to this issue.