On Sun, Apr 24, 2011 at 12:01 AM, Mike Meyer <m...@mired.org> wrote: > On Sat, 23 Apr 2011 23:42:23 -0400 > Ken Wesson <kwess...@gmail.com> wrote: > >> On Sat, Apr 23, 2011 at 11:35 PM, Mike Meyer <m...@mired.org> wrote: >> > On Sat, 23 Apr 2011 23:19:53 -0400 >> > Ken Wesson <kwess...@gmail.com> wrote: >> > >> >> On Sat, Apr 23, 2011 at 8:13 PM, Mike Meyer <m...@mired.org> wrote: >> >> > On Sat, 23 Apr 2011 19:41:28 -0400 >> >> > Ken Wesson <kwess...@gmail.com> wrote: >> >> > or you live in a universe where cosmic rays can flip bits and other >> >> > sources of hardware hiccups exist. >> >> Software crashes caused by non-software-bug-triggered memory >> >> corruption seem to me to be exceedingly rare, and they could as easily >> >> strike critical parts of the operating system as a multithreaded >> >> server program (and a large batch of independent C jobs will occupy >> >> more memory and have a correspondingly larger cross section as a >> >> target for such things). >> >> The best recourse if the server gets hit by something like that is >> >> going to be to reboot it. >> > >> > While it might be exceedingly rare on a per-cpu-second basis, if your >> > application runs 7x24 on enough cpus, you can expect to see them at >> > regular intervals. In which case the best recourse - if you want a >> > stable, robust application - is to restart the smallest set of >> > processes that might have been affected by the problem. >> >> In other words, all of them, since the operating system might have >> been affected by such a problem and if it was, everything else is >> probably affected too. > > Let me guess - you're one of these people who
Ah, I get it. You're arguing because you have some kind of *personal* issue, rather than for any logical reason. > Sure, a hardware glitch that affects the OS means you should reboot > the system. And assuming you can even detect that such a glitch has occurred at all (what if one hits the code doing the detecting, or the memory that it uses -- or the operating system, in a way that affects that code?) can you detect whether or not it hit the operating system? > Of course, if it affects some user process, it may have > affected the OS without leaving evidence of doing so. Then again, it > may not have. While you could reboot everything "just in case", you > could also have a hardware glitch affect the OS without leaving > evidence in any process, so you might as well reboot even though > nothing is wrong "just in case." Obviously there's little point in rebooting absent *some* evidence that something is wrong. Of course, some process segfaulting doesn't mean much if it's a typical C program. On the other hand, if you have a rock-solid JVM and kernel and various JVM bytecodes running, and the JVM faults, the likelihood of a problem like this is higher than if a random other program faulted -- indeed, either it's a JVM bug, an OS bug, or a glitch of the type being discussed, since arbitrary bytecode on a bug-free JVM shouldn't cause the JVM to fault. (Native methods complicate things somewhat though.) > Nah, hardware glitches are either localized, in which case restarting > just the address spaces that failed is sufficient (and has proven so > in practice for years), or they're systemic, in which case you'll have > failures throughout the system. It's pretty easy to tell the > difference between the two and deal with them appropriately. Easy for who? The system administrator? I thought we were considering automated means of recovering faulting systems here. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en