On Sun, Apr 24, 2011 at 12:01 AM, Mike Meyer <m...@mired.org> wrote:
> On Sat, 23 Apr 2011 23:42:23 -0400
> Ken Wesson <kwess...@gmail.com> wrote:
>
>> On Sat, Apr 23, 2011 at 11:35 PM, Mike Meyer <m...@mired.org> wrote:
>> > On Sat, 23 Apr 2011 23:19:53 -0400
>> > Ken Wesson <kwess...@gmail.com> wrote:
>> >
>> >> On Sat, Apr 23, 2011 at 8:13 PM, Mike Meyer <m...@mired.org> wrote:
>> >> > On Sat, 23 Apr 2011 19:41:28 -0400
>> >> > Ken Wesson <kwess...@gmail.com> wrote:
>> >> > or you live in a universe where cosmic rays can flip bits and other
>> >> > sources of hardware hiccups exist.
>> >> Software crashes caused by non-software-bug-triggered memory
>> >> corruption seem to me to be exceedingly rare, and they could as easily
>> >> strike critical parts of the operating system as a multithreaded
>> >> server program (and a large batch of independent C jobs will occupy
>> >> more memory and have a correspondingly larger cross section as a
>> >> target for such things).
>> >> The best recourse if the server gets hit by something like that is
>> >> going to be to reboot it.
>> >
>> > While it might be exceedingly rare on a per-cpu-second basis, if your
>> > application runs 7x24 on enough cpus, you can expect to see them at
>> > regular intervals. In which case the best recourse - if you want a
>> > stable, robust application - is to restart the smallest set of
>> > processes that might have been affected by the problem.
>>
>> In other words, all of them, since the operating system might have
>> been affected by such a problem and if it was, everything else is
>> probably affected too.
>
> Let me guess - you're one of these people who

Ah, I get it. You're arguing because you have some kind of *personal*
issue, rather than for any logical reason.

> Sure, a hardware glitch that affects the OS means you should reboot
> the system.

And assuming you can even detect that such a glitch has occurred at
all (what if one hits the code doing the detecting, or the memory that
it uses -- or the operating system, in a way that affects that code?)
can you detect whether or not it hit the operating system?

> Of course, if it affects some user process, it may have
> affected the OS without leaving evidence of doing so. Then again, it
> may not have. While you could reboot everything "just in case", you
> could also have a hardware glitch affect the OS without leaving
> evidence in any process, so you might as well reboot even though
> nothing is wrong "just in case."

Obviously there's little point in rebooting absent *some* evidence
that something is wrong. Of course, some process segfaulting doesn't
mean much if it's a typical C program. On the other hand, if you have
a rock-solid JVM and kernel and various JVM bytecodes running, and the
JVM faults, the likelihood of a problem like this is higher than if a
random other program faulted -- indeed, either it's a JVM bug, an OS
bug, or a glitch of the type being discussed, since arbitrary bytecode
on a bug-free JVM shouldn't cause the JVM to fault. (Native methods
complicate things somewhat though.)

> Nah, hardware glitches are either localized, in which case restarting
> just the address spaces that failed is sufficient (and has proven so
> in practice for years), or they're systemic, in which case you'll have
> failures throughout the system. It's pretty easy to tell the
> difference between the two and deal with them appropriately.

Easy for who? The system administrator? I thought we were considering
automated means of recovering faulting systems here.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to