On Fri, Feb 4, 2011 at 4:27 PM, Michael Roth <mdr...@linux.vnet.ibm.com> wrote: >> Aborting an RPC handler could leave the system in an inconsistent >> state unless we are careful. For example, aborting freeze requires >> thawing those file systems that have been successfully frozen so far. >> For other handlers it might leave temporary files around, or if they >> are not carefully written may partially update files in-place and >> leave them corrupted. >> >> So instead of a blanket timeout, I think handlers that perform >> operations that may block for unknown periods of time could >> specifically use timeouts. That gives the handler control to perform >> cleanup. > > Good point. Although, I'm not sure I want to push timeout handling to the > actual RPCs though....something as simple as open()/read() can block > indefinitely in certain situations, and it'll be difficult to account for > every situation, and the resulting code will be tedious as well. I'd really > like the make the actual RPC as simple as possible, since it's something > that may be extended heavily over time. > > So what if we simply allow an RPC to register a timeout handler at the > beginning of the RPC call? So when the thread doing the RPC exits we: > > - check to see if thread exited as a result of timeout > - check to see if a timeout handler was registered, if so, call it, reset > the handler, then return a timeout indication > - if it didn't time out, return the response > > The only burden this puts on the RPC author is that information they need to > recover state would need to be accessible outside the thread, which is > easily done by encapsulating state in static/global structs. So the timeout > handler for fsfreeze, as it is currently written, would be something like: > > va_fsfreeze_timeout_handler(): > foreach mnt in fsfreeze.mount_list: > unfreeze(mnt) > fsfreeze.mount_list = NULL > > We'll need to be careful about lists/objects being in weird states due to > the forced exit, but I think it's doable.
Yeah, still requires discipline but it could work. Stefan