Hi all, Symptom: mysterious "signal: killed" occurrences with processes spawned from Go via exec.Cmd.Start()/Wait()
Actors: 'Server': Intel i5, running Funtoo 1.4 - 4GB RAM, 4GB swap 'Laptop': Intel Core i7 9thGen, running Devuan Chimaera - 15GB RAm, 15GB swap A quite small -- ~900 lines of code in two .go files, program I developed which has been running for years just fine on 'Server', which spawns and monitors other programs using exec.Cmd.Start() and then uses exec.Cmd.Wait() to monitor them, started acting very strangely a few days ago, after either: upgrading from go 1.18 to go 1.22 (amd64), and/or non- Go related minor updates (regretfully I don't recall exactly what, but it was not a distro or kernel upgrade, nor a major version of glibc or anything like that). What used to be smoothly launched tasks from my Go program became erratically (then almost always) aborted jobs, with the status of "signal: killed". I can watch the system with 'top or 'htop' and don't see any obvious spikes in mem or CPU usage, and never have with this setup beforehand either. I spent a few evenings verifying 'Server' wasn't out of memory (oom_reaper), as many web searches suggest is the biggest cause of the above, and other system-related things with no luck (no oom logs in /var/log/kern.log, dmesg, etc.). I even tried launching my go program with some of the debug options mentioned here, but didn't see anything unusual: https://github.com/golang/go/issues/31517 Everything else on 'Server' ran fine, so I began to suspect the unthinkable: could the new Go toolchain 1.22.0 I installed have some subtle issue with the server's runtime environment, or could there be a compiler bug? Out of habit, after any go update I usually rebuild this project using the new toolchain, which I believe I had done just before this issue began; so first I reverted to go1.18 on 'Server' (there as a fallback), and rebuilt my tool; no help. I tried clearing out all of ~/go/pkg/mod/* and refetched all dependencies, rebuilt again. Still no help. Then I tried building the project on another machine, 'Laptop', which had go1.21.0; on this machine it ran fine. So I copied *that* build of the tool back to 'Server', and it failed with "sign: killed" as well. I tried the reverse ('Server' build of project, using go1.22.0 copied to 'Laptop' using go1.18.x) but that failed, as there was a GLIBC incompatibility preventing it running. (I think Funtoo had a slightly newer GLIBC version than Devuan). Anyhow, in case it *was* a go toolchain or C library issue beneath Go itself, I downloaded the latest go 1.22.0 source to 'Server' and rebuilt it there from scratch (all.bash); using that to rebuild my tool I was happy to see it now worked normally again... for about 5 rounds of spawning programs, then the issue returned! The 'Server' isn't very powerful, but it's definitely not out of memory and has run my tool reliably for a long time, until the last week: $ free total used free shared buff/cache available Mem: 3928680 1334988 592232 29368 2001460 2277376 Swap: 4194300 136192 4058108 This issue even occurs spawning a simple 'do-nothing' shell script, which just loops sleeping for 5 seconds a few times before exiting. These processes/scripts run perfectly from the shell command line, without using my Go program to launch them. What the heck is going on? How do I diagnose this? -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAN4yCu8Knootg-4%3DNS_EPzLKssg18Lv4f6QZ%2B16-gMBB-kaHSw%40mail.gmail.com.