On Thu, Feb 18, 2010 at 3:12 PM, Adrian Tritschler <a...@ajft.org> wrote: > Flips in code are more likely to cause crashes, but still not > guaranteed.
You'd need to look at fraction of total that is data vs. code, then at fraction of total code that is going to cause hurt if flipped. This stuff can have numbers attached. Here's an example from my world. 1 MB of code, 32 MB of kernel, and 2GB minus that of data. This is a lower end ratio as the nodes don't have much memory. If the data is flipped, you're not going to know of errors unless you are looking for numerical instability. People do. And if you're in the middle of a checkpoint you won't know then. But, this is all stuff that could be estimated ... and in some cases I think has been. ron