On 15/01/2025 11:59, Pádraig Brady wrote:
On 15/01/2025 07:33, Bernhard Voelker wrote:
Hi Padraig,
On 1/15/25 02:36, Pádraig Brady wrote:
This avoids a test hang seen with bash on Solaris 11
+ # Wrap with timeout as on some systems (like Solaris 11)
+ # fork() can return EAGAIN under vm constraints,
+ # causing bash at least to loop retrying the fork
thanks, the reasoning becomes clear with this.
+ timeout 1 true >/dev/null ||
+ skip_ 'ulimit -v cannot be safely checked'
- ulimit -v $v && "$@"
- ) >/dev/null 2>&1
+ timeout 10 $SHELL "$abs_srcdir"/tests/ulimit_supported "$@"
ulimit_supported is mainly used by get_min_ulimit_v_ for a given
coreutils tool, while the above now uses $SHELL in between:
isnt't that usually much larger than our tools, and hence the
resulting minimum ulimit_v will be wrong then?
A valid consideration, but I think that should be OK
as the `ulimit -v` is only applied to the last subshell,
and so shouldn't behave significantly differently.
I.e. we're doing, timeout -> $SHELL -> ulimit; not
ulimit -> timeout -> $SHELL
Actually you're right, but in the opposite way :)
With the $SHELL invocation we get a clean shell
and thus the limits returned can be too _small_.
In testing with the `timeout $SHELL` the limit returned was 3000,
while with the original ulimit in a subshell it was around 5000.
So we can't use the timeout $SHELL mechanism really I think.
Now bash doesn't actually loop forever in this constrained case.
It uses an increasing backoff before retrying the fork, and
limits the retries to 5, which adds up to about 15s,
which isn't much worse than the 10s timeout anyway.
I also noticed that the `trap '' SEGV` _or_ the `ulimit -c 0`
are required to trigger this on bash 5.2.32(1)-release (sparc-sun-solaris2.11):
$ date; (ulimit -c 0; ulimit -v 3000 && src/cat --version); date
Wednesday, January 15, 2025 at 2:43:54 PM CET
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: Resource temporarily unavailable
Terminated
Wednesday, January 15, 2025 at 2:44:09 PM CET
$ date; (ulimit -v 3000 && src/cat --version); date
Wednesday, January 15, 2025 at 2:44:17 PM CET
ld.so.1: cat: fatal: /lib/64/libc.so.1: Cannot allocate memory
Killed
Wednesday, January 15, 2025 at 2:44:17 PM CET
In fact once any extra command is used it adds the extra resource requirement:
$ (true; ulimit -v 12000 && src/cat --version)
-bash: fork: retry: Resource temporarily unavailable
$ (ulimit -v 4000 && src/cat --version)
cat (GNU coreutils) 9.5.239-a836
Now it's tempting to merge the ulimit calls like:
$ (ulimit -c 0 -v 4000 && src/cat --version)
cat (GNU coreutils) 9.5.239-a83
but then we might underestimate, and we do have usage in scripts of
(something && ulimit -v $vm && ...), so it's probably best to
leave that as is.
Looking at the bash code, it hardcodes the retries to taking about 16 seconds,
so I'm thinking our only recourse is to bail if we detect this slow case.
The attached does that, and I've verified it runs fine on Solaris 11 and Fedora
41 at least.
cheers,
Pádraig
From 934370b85aa5e52bf83db228a9d97e3c41ac6473 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Wed, 15 Jan 2025 15:13:00 +0000
Subject: [PATCH] tests: avoid slow ulimit -v behavior
* init.cfg (ulimit_supported_): skip_ if the ulimit -v
takes too long, which was seen with bash 5.2 on Solaris 11,
where fork() returned EAGAIN under memory constraints,
and bash retried for about 16 seconds.
(get_min_ulimit_v_): Break early if skipped.
---
init.cfg | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/init.cfg b/init.cfg
index 0aaba4ee2..04476eb74 100644
--- a/init.cfg
+++ b/init.cfg
@@ -178,6 +178,8 @@ ulimit_supported_()
v="$1"
shift
+ local ulimit_start_=$(date +%s) || skip_ 'ulimit: date error'
+
(
# Try to disable core dumps which may
# occur with memory constraints
@@ -185,6 +187,16 @@ ulimit_supported_()
ulimit -v $v && "$@"
) >/dev/null 2>&1
+
+ ret=$?
+
+ local ulimit_end_=$(date +%s) || skip_ 'ulimit: date error'
+
+ # This can happen on Solaris 11 at least where fork() can give EAGAIN
+ # and bash will retry for at least 16 seconds
+ test $(($ulimit_end_ - $ulimit_start_)) -ge 10 && skip_ 'ulimit too slow'
+
+ return $ret
}
# Determine the minimum required VM limit to run the given command.
@@ -200,11 +212,15 @@ get_min_ulimit_v_()
page_size=$(($page_size / 1024))
for v in $( seq 5000 5000 50000 ); do
+ ulimit_supported_ $v "$@"; ret=$?
+ test $ret = 77 && break;
if ulimit_supported_ $v "$@"; then
local prev_v
prev_v=$v
for v in $( seq $(($prev_v-1000)) -1000 1000 ); do
- ulimit_supported_ $v "$@" ||
+ ulimit_supported_ $v "$@"; ret=$?
+ test $ret = 77 && break 2;
+ test $ret = 0 ||
{
ret_v=$((prev_v + $page_size))
echo $ret_v
--
2.47.1