Hi all,

Symptom: mysterious "signal: killed" occurrences with processes spawned
from Go via exec.Cmd.Start()/Wait()

Actors:

'Server': Intel i5, running Funtoo 1.4 - 4GB RAM, 4GB swap
'Laptop': Intel Core i7 9thGen, running Devuan Chimaera - 15GB RAm, 15GB
swap

A quite small -- ~900 lines of code in two .go files, program I developed
which has been running for years just fine on 'Server', which spawns and
monitors other programs using exec.Cmd.Start() and then uses
exec.Cmd.Wait() to monitor them, started acting very strangely a few days
ago, after either: upgrading from go 1.18 to go 1.22 (amd64), and/or non-
Go related minor updates (regretfully I don't recall exactly what, but it
was not a distro or kernel upgrade, nor a major version of glibc or
anything like that).

What used to be smoothly launched tasks from my Go program became
erratically (then almost always) aborted jobs, with the status of "signal:
killed". I can watch the system with 'top or 'htop' and don't see any
obvious spikes in mem or CPU usage, and never have with this setup
beforehand either.

I spent a few evenings verifying 'Server' wasn't out of memory
(oom_reaper), as many web searches suggest is the biggest cause of the
above, and other system-related things with no luck (no oom logs in
/var/log/kern.log, dmesg, etc.).

I even tried launching my go program with some of the debug options
mentioned here, but didn't see anything unusual:

https://github.com/golang/go/issues/31517

Everything else on 'Server' ran fine, so I began to suspect the
unthinkable: could the new Go toolchain 1.22.0 I installed have some subtle
issue with the server's runtime environment, or could there be a compiler
bug?

Out of habit, after any go update I usually rebuild this project using the
new toolchain, which I believe I had done just before this issue began; so
first I reverted to go1.18 on 'Server' (there as a fallback), and rebuilt
my tool; no help. I tried clearing out all of ~/go/pkg/mod/* and refetched
all dependencies, rebuilt again. Still no help.

Then I tried building the project on another machine, 'Laptop', which had
go1.21.0; on this machine it ran fine. So I copied *that* build of the tool
back to 'Server', and it failed with "sign: killed" as well.

I tried the reverse ('Server' build of project, using go1.22.0 copied to
'Laptop' using go1.18.x) but that failed, as there was a GLIBC
incompatibility preventing it running. (I think Funtoo had a slightly newer
GLIBC version than Devuan).

Anyhow, in case it *was* a go toolchain or C library issue beneath Go
itself, I downloaded the latest go 1.22.0 source to 'Server' and rebuilt it
there from scratch (all.bash); using that to rebuild my tool I was happy to
see it now worked normally again... for about 5 rounds of spawning
programs, then the issue returned!

The 'Server' isn't very powerful, but it's definitely not out of memory and
has run my tool reliably for a long time, until the last week:

$ free
               total        used        free      shared  buff/cache
available
Mem:         3928680     1334988      592232       29368     2001460
2277376
Swap:        4194300      136192     4058108

This issue even occurs spawning a simple 'do-nothing' shell script, which
just loops sleeping for 5 seconds a few times before exiting. These
processes/scripts run perfectly from the shell command line, without using
my Go program to launch them.

What the heck is going on? How do I diagnose this?

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAN4yCu8Knootg-4%3DNS_EPzLKssg18Lv4f6QZ%2B16-gMBB-kaHSw%40mail.gmail.com.

Reply via email to