On 2016-10-18 05:25, li...@wrant.com wrote:
Mon, 17 Oct 2016 18:00:39 +0200 Karel Gardas <gard...@gmail.com>
1) use machine with proper ECC support
Hello Karel,
Please explain this "proper ECC support" for every laptop user out
there?
[..]
Mon, 17 Oct 2016 21:48:47 +0800 Tinker <ti...@openmailbox.org>
Sometimes a machine goes unresponsive. In this case, a non-ECC RAM
machine.
Hello Tinker,
This is one very intriguing problem with a very trivial solution:
reboot.
The idea to work around missing ECC support with software is as
practical
[..]
Hi Anton,
You misread me -
What I queried for was not how to trig some event logic on bit flip
errors (because on a non-ECC machine those will generally appear as data
corruption or undefined behavior only) or other hardware or kernel
error, but:
How to trig some event logic when the system has become vegetable
because of overload by the userland?
My limited experience here says that system overload caused by user
processes can lead to that all processes die or freeze, and that the
system goes otherwise unresponsive, except for that terminal input still
is echoed.
And for that I speculated that such event logic could be implemented as
some in-kernel code e.g. as a kernel thread, if those have some kind of
higher execution guarantee than user process code,
E.g., when a userland watchdog/monitoring process didn't send any "I'm
OK" signal to that thread for 60 seconds, that thread would dump the
system's state to the console and reboot the machine.
This way I'd be able to distinguish userland-caused system crashes from
hardware/kernel crashes, as the further always make that output and
reboot, whereas the latter don't (but instead reboot, crash to kernel
debug console, or just freeze the system altogether).
Do you see where I was heading now?
Tinker