> besides panicing, of course. Ideally, I think...
Corrected error: Usually, log and ignore. Maybe watch for elevated levels of corrected errors and disable either the containing page or the containing memory stick, depending on how much the hardware lets the kernel determine and maybe policy sysctls. Maybe even allow paranoid sysadmins to configure "elevated levels of" to mean "any". Uncorrectable error: Log. Disable the containing page and/or stick, as mentioned above. If it's for the contents of a dirty page, about all we can do is deliver a memory-error signal. If it's for a clean page (including (most) instruction-stream fetches), re-fetch the virtual page into a new physical page and carry on. > This is going to involve a lot of help from UVM. Probably. Maybe the pmap, too, for things such as figuring out what regions of RAM would have to be disabled to stop using the affected memory stick, or the like. > If uvm_page_error can't "correct" the error, it would panic. I'd recommend doing that only for kernel accesses; for userland, I'd much prefer to blow up at most the process incurring the fault. > Preemptively, we could have a thread force dirty cache lines to > memory if they've been in L2 "too long" (thereby reducing the problem > to an ECC error on a clean cache line which means you just toss the > cache-line contents.) Depends. Are we talking ECC on L2 cache, or on main memory? I'd say the results should be different. > We can also have a thread that reads all of memory (slowly) thereby > causing any single bit errors to be corrected before they become > double-bit errors. Well, to be detected. Whether the correct action upon detecting them is to silently correct them is a policy matter I'd prefer to avoid wiring into the kernel. > I'm not familiar enough with UVM internals to actually know what to > do but I hope someone else reading this is. Me neither. I have just about zero idea how implementable any of the above is; I've been speaking in ideal generalities. (My idea of ideal generalities, that is, of course.) /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B