Ken Moffat wrote: > When I woke up in the > morning, I was surprised to find that my OpenJDK script was still running > rm -rf for the source directory, and had been doing so for more than 5 > hours (both wall-clock time and CPU time), and was now at 99%-100% of one > CPU, according to top.
...Odd.
I upgraded to 3.13.5 a day after it got released, and just yesterday saw
something vaguely but not exactly similar. Firefox (same binary as before the
kernel upgrade) crashed when I was mid-mouse-movement, which seemed odd, so I
poked at kernel logs. There were a couple of "BUG: Bad page map in process
firefox" and "BUG: bad page state in process firefox", when trying to
madvise() (in zap_page_range -> unmap_single_vma) and page_fault,
respectively, about 15 minutes before the crash. Then again at crash time,
the same pair of BUG messages, both in int_signal; the first was down in
do_group_exit -> unmap_vmas -> unmap_single_vma, and the second was down in
unmap_single_vma -> release_pages -> free_pages_prepare.
Then it logged "BUG: bad rss-counter state" twice, followed by "INFO:
rcu_preempt detected stalls on CPUs/tasks: {} (detected by 4, t=18002 jiffies,
g=81303, c=81302, q=7261" and "INFO: Stall ended before state dump start".
And about when it logged the rcu_preempt message, CPU 4 went busy-looping in
kernel space (according to gkrellm, which showed 100% in orange instead of the
userspace cyan or userspace-niced green) in a kworker thread (according to
top). Had to reboot to get it back (trying to exit X also hung; most likely
something got scheduled onto that worker during handoff to the console driver
or something like that; had to alt-sysrq-u / b to get it to actually reboot).
So I guess this is a long way of saying -- are you sure the rm userspace code
is what was hung, and not something in the kernel? Might be a prevalence of
cosmic rays I suppose, or it might be a memory corruption bug somewhere
causing issues with RCU.
(OTOH this system isn't really anywhere near stock LFS, either. Not sure how
different it is from yours, but it's multilib with a pretty old gcc/glibc.)
signature.asc
Description: OpenPGP digital signature
-- http://linuxfromscratch.org/mailman/listinfo/lfs-dev FAQ: http://www.linuxfromscratch.org/faq/ Unsubscribe: See the above information page
