Hello and a nice Saturday evening, Mr. Elz, and everyone. While it is not a bash bug, and therefore quite off topic, i come back to this once more. Maybe it is of interest for someone.
And maybe someone can shed some light on this. This would be nice. Steffen Nurpmeso wrote in <20190807193402.d1zqm%stef...@sdaoden.eu>: |Steffen Nurpmeso wrote in <20190806142527.9hs0i%stef...@sdaoden.eu>: ||Robert Elz wrote in <26245.1565045...@jinx.noi.kre.to>: ||| Date: Mon, 05 Aug 2019 14:05:43 +0200 ||| From: Steffen Nurpmeso <stef...@sdaoden.eu> ||| Message-ID: <20190805120543.bf9-u%stef...@sdaoden.eu> | .. |||The shell cannot really know - your example was not functional until |||after it set up the traps. | .. |||No temp files, named pipes, or othe similar stateful mechanisms needed. | |Sorry for all that noise once again, but i have then rewritten it |using mkfifo etc. with credits for some of you (which collects |things i have seen flying by since Saturday night): | | They also came up with the solution: do not wait(1) on child | processes until we know about their state, so that anytime before we | actually do wait(1) we can safely kill(1) them (Jilles Tjoelker). | Thus, let's create a FIFO (Chet Ramey) to get a synchronized | device, strip the wild test undertaker to a core that only writes | "timeout" to that FIFO, and also improve its startup-is-completed to | simply send a signal to the parent process (Robert Elz). So | either the tests finish nicely, in which case they write their job | number to the fifo, or we see "timeout" and kill all remains. ... The problem is that it does not work out portably. Maybe i am getting something wrong, but i see failures on multi processor OpenBSD 6.5/i386 and FreeBSD 11.3-RC2/i386 (in a Linux KVM/Qemu). On these i see mx-test.sh[8467]: can't open t.fifo: Interrupted system call quite frequently, even if there are no traps installed at all, and data written to the FIFO is occasionally lost. It is written in ( trap '' HUP INT TERM EXIT if ${mkdir} t.${JOBS}.d; then ( cd t.${JOBS}.d && eval t_${1} ${JOBS} ${1} ) fi [ -e t.fifo ] && echo ${JOBS} >> t.fifo ) > t.${JOBS}.io 2>&1 </dev/null & and i can put it in an if.fi and see that echo has happened, with a successful $?. But in the parent loop while [ 1 ]; do read js < t.fifo # I saw quite frequest "Interrupted system call" errors on FreeBSD! Also OpenBSD [ ${?} -ne 0 ] && continue it will never be read! I.e., whereas the test is an actual success and exits fine we end up with ... [1=digmsg] [2=on_main_loop_tick] [3=compose_hooks] [4=mass_recipients] .. waiting ...mx-test.sh: cannot open t.fifo: Interrupted system call !! Timeout: reaped job(s) 2/[on_main_loop_tick] but also like this: ... [1=q_t_etc_opts] [2=message_injections] [3=attachments] [4=rfc2231] .. waiting !! Timeout: reaped job(s) 1/[q_t_etc_opts] This does never happen on Linux (x86-64). So then i have to make the tests repeatedly write to the FIFO, and kill(1) them when the parent really gets to read it (and kill(1) them hard if we read the "timeout"), as in: ( trap '' HUP INT TERM EXIT if ${mkdir} t.${JOBS}.d; then ( cd t.${JOBS}.d && eval t_${1} ${JOBS} ${1} ) fi trap 'exit 0' USR1 while [ -e t.fifo ]; do echo >&2 JOB $JOBS WRITES FIFO echo ${JOBS} >> t.fifo sleep 1 done ) > t.${JOBS}.io </dev/null & # 2>&1 </dev/null & as well as while [ 1 ]; do read js < t.fifo echo >&2 FROM FIFO I READ $js [ ${?} -ne 0 ] && continue JOBDESC=`${awk} -v L="${JOBDESC}" ' BEGIN{ while(1){ sub("^[ ]+", "", L) sub("[ ]+$", "", L) if(length(L) == 0) break x = L sub("[ ]+.+$", "", x) y = z = x sub("^[0-9]+=[0-9]+/", "", z) sub("/.+$", "", y) x = y sub("=.+", "", x) sub(".+=", "", y) print x " " y " " z sub("^[^ ]+", "", L) } } ' | { l= kl= while read j p n; do if [ ${js} = timeout ]; then kl="${kl} ${j}/[${n}]" echo >&2 KILL ING $j=$p/$n kill -KILL ${p} >/dev/null 2>&1 ${rm} -f t.${j}.result elif [ ${js} = ${j} ]; then echo >&2 USR1 ING $j=$p/$n kill -USR1 ${p} >/dev/null 2>&1 else l="${l} ${j}=${p}/${n}" fi done if [ ${js} = timeout ] && [ -n "${kl}" ]; then printf >&2 '%s!! Timeout: reaped job(s)%s%s\n' \ "${COLOR_ERR_ON}" "${kl}" "${COLOR_ERR_OFF}" fi echo ${l} }` [ ${js} = timeout ] && break # If all jobs finished regulary: done [ -z "${JOBDESC}" ] && break done But, even then, see this: ... [1=X_Y_opt_input_go_stack] [2=X_errexit] [3=Y_errexit] [4=S_freeze] .. waiting JOB 3 WRITES FIFO FROM FIFO I READ 3 USR1 ING 3=8203/Y_errexit JOB 4 WRITES FIFO JOB 2 WRITES FIFO FROM FIFO I READ 4 USR1 ING 4=8210/S_freeze JOB 1 WRITES FIFO FROM FIFO I READ 1 USR1 ING 1=8189/X_Y_opt_input_go_stack ...mx-test.sh[8470]: can't open t.fifo: Interrupted system call ...mx-test.sh[8470]: can't open t.fifo: Interrupted system call ...mx-test.sh[8470]: can't open t.fifo: Interrupted system call FROM FIFO I READ timeout KILL ING 2=8195/X_errexit So then i do ( trap '' HUP INT TERM EXIT if ${mkdir} t.${JOBS}.d; then ( cd t.${JOBS}.d && eval t_${1} ${JOBS} ${1} ) fi if [ -n "${JOBREAPER}" ]; then trap 'exit 0' USR1 while [ 1 ]; do echo >&2 JOB $JOBS WRITES FIFO echo ${JOBS} >> t.fifo sleep 3 done fi ) > t.${JOBS}.io </dev/null & # 2>&1 </dev/null & And with that, finally, i get ... [1=alias] [2=charsetalias] [3=shortcut] [4=expandaddr] .. waiting JOB 2 WRITES FIFO JOB 3 WRITES FIFO FROM FIFO I READ 3 The 2 is not there!! USR1 ING 3=20540/shortcut JOB 1 WRITES FIFO FROM FIFO I READ 1 USR1 ING 1=20526/alias JOB 4 WRITES FIFO FROM FIFO I READ 4 USR1 ING 4=20549/expandaddr JOB 2 WRITES FIFO FROM FIFO I READ 2 USR1 ING 2=20532/charsetalias But, after a dozen tests, and with reducing the sleep to 1 (and reducing the debug echoes): ... [1=ifelse] [2=localopts] [3=local] [4=environ] .. waiting JOB 3 WRITES FIFO JOB 2 WRITES FIFO JOB 4 WRITES FIFO JOB 1 WRITES FIFO /usr/home/steffen/src/nail.git/mx-test.sh[8471]: can't open t.fifo: Interrupted system call /usr/home/steffen/src/nail.git/mx-test.sh[8471]: can't open t.fifo: Interrupted system call /usr/home/steffen/src/nail.git/mx-test.sh[8471]: can't open t.fifo: Interrupted system call !! Timeout: reaped job(s) 3/[local] It does not loop! So i have extended to sleep to 3 again, and placed the echo in a subshell. Other than that i offer a "testnj" make target. I am entirely out of ideas. A nice Sunday i wish. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)