On 7/30/13 18:24 , "Michael Roth" <mdr...@linux.vnet.ibm.com> wrote:
>Quoting Tomoki Sekiyama (2013-07-30 15:54:05) >>On 7/30/13 15:35 , "Michael Roth" <mdr...@linux.vnet.ibm.com> wrote: >>>>>One small issue I noticed was that this error will get overwritten >>>>>with the VSS writer timeout error if we wait longer than 60s before >>>>>calling guest-fsfreeze-thaw. It might give some users false assurances >>>>>about what aspects of their snapshot may be volatile so it's >>>>>probably worth addressing. >>>>This is an error returned against guest-fsfreeze-freeze, when the >>>>writers or filesystems take more than 60s to quiesce. >>>>(CQGAVssProvider::CommitSnapshots that issues FrozenEvent is >>>>called after this quiescing succeeded.) The VSS sequence is aborted >>>>at "out:". If this happens, as the system remains thawed state, the >>>>following guest-fsfreeze-thaw will just return 0. >>> >>>This is the example I'm referring to: >>> >>>{'execute':'guest-fsfreeze-freeze'} >>>{"return": 2} >>>/* wait 10+ seconds */ >>>{'execute':'guest-fsfreeze-thaw'} >>>{"error": {"class": "GenericError", "desc": "couldn't hold writes: >>>fsfreeze is limited up to 10 seconds: (error: 80042314)"}} >>>{'execute':'guest-fsfreeze-freeze'} >>>{"return": 2} >>>/* wait 60+ seconds */ >>>{'execute':'guest-fsfreeze-thaw'} >>>{"error": {"class": "GenericError", "desc": "failed to do snapshot set: >>>(error: 8004230f)"}} >>> >>>It this seems to be because CommitSnapshot returns in the latter >>>instance >>>due >>>to VSS_TIMEOUT_MSEC wait and it's E_ABORT error message overwrites the >>>VSS_E_HOLD_WRITES_TIMEOUT from earlier. Perhaps we could just have >>>CommitSnapshot return VSS_E_HOLD_WRITES_TIMEOUT if it doesn't get the >>>thaw >>>event in time? I think that error message is much more informative for >>>users. >>Agreed. >>How about modifying the provider to return S_OK instead of E_ABORT >>when it exceeds VSS_TIMEOUT_MSEC? >>As VSS_TIMEOUT_MSEC is larger than 10 seconds fsfreeze timeout, >>CommitSnapshot will return VSS_E_HOLD_WRITES_TIMEOUT on provider's >>timeout, >>and give "fsfreeze is limited up to 10 seconds" message to users. >>The writers are also automatically thawed by 60 seconds timeout anyway, >>this wouldn't break anything. > >Hmm, it seems like it would work, but I'm a bit worried about returning >S_OK and hoping VSS still produces an error. Probably really unlikely, but >I could imagine a really strange timing issue where the 60 second timeout >for whatever reason gets triggered before the 10 second one (maybe because >one vcpu is being starved for whatever reason, don't know enough about >windows timekeeping mechanisms to know how plausible this is, but I'd >rather not rely on it not happening). > >One way you could maybe do it would be maybe introducing a hEventTimeout >that CommitSnapshots will set if it times out. That way we can poll >it to see if the event is set when we get VSS_E_UNEXPECTED_PROVIDER_ERROR >in the requester. If it's set we know a timeout occurred on provider side, >and return the E_VSS_HOLD_WRITES_TIMEOUT error. OK, this sounds more reliable. I will take this way. Thanks, Tomoki Sekiyama