> > Ideally I think we'd like to leave the node up to serve reads, if a > disk is erroring out on writes but still read-able. In my experience > this is very common when a disk first begins to fail, as well as in > the "disk is full" case where there is nothing actually wrong with the > disk per se.
This depends on hardware/drivers in use as well as failling part. On some failures disk just disappears completely (controller failures, SAN links etc.). And the easiest way to bring operations team attention to node is shutting it down - anyway ppl have something to do with it. Furthermore, single node shutdown should be not very hurting to cluster's performance in production - everyone planning capacity in a way to survive single node failure.