Hi Bryan,
On 10/28/15 15:10, Bryan Drewery wrote:
On 10/23/2015 9:34 AM, Russell L. Carter wrote:
Greetings,
Recently my nightly cron poudriere builds have been occasionally
hanging. For instance, here's last night's, with apparently no
progress for over 10 hours:
root@terpsichore> poudriere status
SET PORTS JAIL BUILD STATUS QUEUE
BUILT FAIL SKIP IGNORE REMAIN TIME LOGS
- default 10-stable-amd64 2015-10-22_22h30m08s parallel_build 488
34 0 0 0 454 10:45:56
/ssd1/poudriere/data/logs/bulk/10-stable-amd64-default/2015-10-22_22h30m08s
root@terpsichore>
Also check 'poudriere status -b' to see per-builder status. Something
may be actually doing something. Poudriere will timeout builds after a
long time. I forget the default but it may be up to 24 hours.
Good to know. I will try that out, probably tomorrow morning. The
last two night's poudriere bulk builds have hung, but as I mentioned
before, when run from the console the exact same script succeeds
and poudriere shuts down cleanly. poudriere jail -k seems to mostly
work ok for recovering.
This just started last week after near a year of flawless cron'd jobs.
(poudriere was flawless, ports are another matter).
htop now shows no significant activity for the specified 3 builders:
root@terpsichore> ps xa | grep poud
72482 - Is 0:00.01 /bin/sh /root/poudriere/run-poudriere-bulk
73202 - S 0:04.24 sh -e /usr/local/share/poudriere/bulk.sh -f
/root/poudriere/ports -j 10-stable-amd64
73347 - S 1:55.38 sh -e /usr/local/share/poudriere/bulk.sh -f
/root/poudriere/ports -j 10-stable-amd64
73352 - I 0:00.08 sh -e /usr/local/share/poudriere/bulk.sh -f
/root/poudriere/ports -j 10-stable-amd64
6119 1 S+ 0:00.00 grep poud
root@terpsichore>
If I reboot, so that the tmp zfs filesystems are unmounted, and
manually rerun the exact same script as the previous cron'd, hung
instance, poudriere has (so far) run to completion.
Please record 'procstat -kka' before rebooting in case this is some kind
of deadlock.
Will do. Many thanks for the suggestions. It sure smells like luser
fail but I don't see it yet...
Best,
Russell
I'm not sure how to debug this, but in the interim, I'm very curious
how I can stop the hung bulk run, and either restart it, or clean up
the various mounted zfs filesystems and manually restart from the
beginning w/o rebooting. Studying the man page, it's not clear at all
the Right Way to do this, so any pointers here would be appreciated.
Kill -TERM the main poudriere process. It will clean up children.
Beyond that you can 'poudriere jail -j NAME -p TREE -z SET -k' to clean
up any mounts leftover from a previous build.
Adding a 'poudriere kill' command is on the todo list.
I'm leaving the system untouched for now so that I can try out any
suggestions for cleanup and restart.
_______________________________________________
freebsd-ports@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscr...@freebsd.org"