On Fri, Feb 4, 2011 at 4:27 PM, Michael Roth <mdr...@linux.vnet.ibm.com> wrote:
>> Aborting an RPC handler could leave the system in an inconsistent
>> state unless we are careful.  For example, aborting freeze requires
>> thawing those file systems that have been successfully frozen so far.
>> For other handlers it might leave temporary files around, or if they
>> are not carefully written may partially update files in-place and
>> leave them corrupted.
>>
>> So instead of a blanket timeout, I think handlers that perform
>> operations that may block for unknown periods of time could
>> specifically use timeouts.  That gives the handler control to perform
>> cleanup.
>
> Good point. Although, I'm not sure I want to push timeout handling to the
> actual RPCs though....something as simple as open()/read() can block
> indefinitely in certain situations, and it'll be difficult to account for
> every situation, and the resulting code will be tedious as well. I'd really
> like the make the actual RPC as simple as possible, since it's something
> that may be extended heavily over time.
>
> So what if we simply allow an RPC to register a timeout handler at the
> beginning of the RPC call? So when the thread doing the RPC exits we:
>
> - check to see if thread exited as a result of timeout
> - check to see if a timeout handler was registered, if so, call it, reset
> the handler, then return a timeout indication
> - if it didn't time out, return the response
>
> The only burden this puts on the RPC author is that information they need to
> recover state would need to be accessible outside the thread, which is
> easily done by encapsulating state in static/global structs. So the timeout
> handler for fsfreeze, as it is currently written, would be something like:
>
> va_fsfreeze_timeout_handler():
>    foreach mnt in fsfreeze.mount_list:
>        unfreeze(mnt)
>    fsfreeze.mount_list = NULL
>
> We'll need to be careful about lists/objects being in weird states due to
> the forced exit, but I think it's doable.

Yeah, still requires discipline but it could work.

Stefan

Reply via email to