On 11/06/2015 01:46 PM, John Snow wrote: > > > On 11/06/2015 03:32 AM, Kevin Wolf wrote: >> Am 05.11.2015 um 19:52 hat John Snow geschrieben: >>> >>> >>> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote: >>>> On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote: >>>>> >>>>> >>>>> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote: >>>>>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote: >>>>>>> @@ -1732,6 +1757,10 @@ static void >>>>>>> block_dirty_bitmap_add_prepare(BlkActionState *common, >>>>>>> BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState, >>>>>>> common, common); >>>>>>> >>>>>>> + if (action_check_cancel_mode(common, errp) < 0) { >>>>>>> + return; >>>>>>> + } >>>>>>> + >>>>>>> action = common->action->block_dirty_bitmap_add; >>>>>>> /* AIO context taken and released within >>>>>>> qmp_block_dirty_bitmap_add */ >>>>>>> qmp_block_dirty_bitmap_add(action->node, action->name, >>>>>>> @@ -1767,6 +1796,10 @@ static void >>>>>>> block_dirty_bitmap_clear_prepare(BlkActionState *common, >>>>>>> common, common); >>>>>>> BlockDirtyBitmap *action; >>>>>>> >>>>>>> + if (action_check_cancel_mode(common, errp) < 0) { >>>>>>> + return; >>>>>>> + } >>>>>>> + >>>>>>> action = common->action->block_dirty_bitmap_clear; >>>>>>> state->bitmap = block_dirty_bitmap_lookup(action->node, >>>>>>> action->name, >>>>>> >>>>>> Why do the bitmap add/clear actions not support err-cancel=all? >>>>>> >>>>>> I understand why other block jobs don't support it, but it's not clear >>>>>> why these non-block job actions cannot. >>>>>> >>>>> >>>>> Because they don't have a callback to invoke if the rest of the job fails. >>>>> >>>>> I could create a BlockJob for them complete with a callback to invoke, >>>>> but basically it's just because there's no interface to unwind them, or >>>>> an interface to join them with the transaction. >>>>> >>>>> They're small, synchronous non-job actions. Which makes them weird. >>>> >>>> Funny, we've been looking at the same picture while seeing different >>>> things: >>>> https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion >>>> >>>> I think I understand your idea: the transaction should include both >>>> immediate actions as well as block jobs. >>>> >>>> My mental model was different: immediate actions commit/abort along with >>>> the 'transaction' command. Block jobs are separate and complete/cancel >>>> together in a group. >>>> >>>> In practice I think the two end up being similar because we won't be >>>> able to implement immediate action commit/abort together with >>>> long-running block jobs because the immediate actions rely on >>>> quiescing/pausing the guest for atomic commit/abort. >>>> >>>> So with your mental model the QMP client has to submit 2 'transaction' >>>> commands: 1 for the immediate actions, 1 for the block jobs. >>>> >>>> In my mental model the QMP client submits 1 command but the immediate >>>> actions and block jobs are two separate transaction scopes. This means >>>> if the block jobs fail, the client needs to be aware of the immediate >>>> actions that have committed. Because of this, it becomes just as much >>>> client effort as submitting two separate 'transaction' commands in your >>>> model. >>>> >>>> Can anyone see a practical difference? I think I'm happy with John's >>>> model. >>>> >>>> Stefan >>>> >>> >>> We discussed this off-list, but for the sake of the archive: >>> >>> == How it is now == >>> >>> Currently, transactions have two implicit phases: the first is the >>> synchronous phase. If this phase completes successfully, we consider the >>> transaction a success. The second phase is the asynchronous phase where >>> jobs launched by the synchronous phase run to completion. >>> >>> all synchronous commands must complete for the transaction to "succeed." >>> There are currently (pre-patch) no guarantees about asynchronous command >>> completion. As long as all synchronous actions complete, asynchronous >>> actions are free to succeed or fail individually. >> >> You're overcomplicating this. All actions are currently synchronous and >> what you consider asynchronous transaction actions aren't actually part >> of the transaction at all. The action is "start block job X", not "run >> block job X". >> > > "OK," except the entire goal of the series was to allow the "aren't > actually part of the transaction at all" parts to live and die together > based on the transaction they were launched from. This really implies > they belong to a second asynchronous phase of the transaction. > > Otherwise, why would totally unrelated jobs fail because a different one > did? > > I realize this is an /extension/ of the existing semantics vs a "fix," > and part of the confusion is how I and everyone else was looking at it > differently. How could this happen, though? Let's look at our > transaction documentation: > > "Operations that are currently supported:" [...] "- drive-backup" [...] > "If there is any failure > performing any of the operations, all operations for the group are > abandoned." > > Where do we say "Except drive-backup, though, because technically the > drive-backup action only starts the job, and we don't really consider > this to be part of the transaction" ? We've never actually defined this > behavior as part of our API as far as I can see. > > Regardless: the net effect is still the same. We want jobs launched by > transactions to (optionally!) cancel themselves on failure. The > complications only arise because people want the exact semantic meaning > to be precise for the QMP API. > > The concept is simple, the language to describe it has been muddy. > >>> == My Model == >>> >>> The current behavior is my "err-cancel = none" scenario: we offer no >>> guarantee about the success or failure of the transaction as a whole >>> after the synchronous portion has completed. >>> >>> What I was proposing is "err-cancel = all," which to me means that _ALL_ >>> commands in this transaction are to succeed (synchronous or not) before >>> _any_ actions are irrevocably committed. This means that for a >>> hypothetical mixed synchronous-asynchronous transaction, that even after >>> the transaction succeeded (it passed the synchronous phase), if an >>> asynchronous action later fails, all actions both synchronous and non >>> are rolled-back -- a kind of retroactive failure of the transaction. >>> This is clearly not possible in all cases, so commands that cannot >>> support these semantics will refuse "err-cancel = all" during the >>> synchronous phase. >> >> Is this possible in any case? You're losing transaction semantics the >> lastest when you drop the BQL that the monitor holds. At least atomicity >> and isolation aren't given any more. >> > > It might be possible in at least *some* cases. I currently don't even > attempt it, though. This is why some transaction actions just refuse > this parameter if it shows up. > > It'd be pretty easy to undo a bitmap-add or a bitmap-clear, as long as I > gave these actions a conditional success callback. > > The "undo" semantics of the jobs are somewhat different. I am not > suggesting we can teleport back in time to before we tried to do the > backup, but we can at least abort the backup and make it like you never > issued the command -- which is important for incremental backups. > > The rollback behavior for each action needs to be spelled out in our > documentation ... as well as categorizing which actions complete before > qmp_transactions return and which may have lingering effects (like > drive-backup.) > >> You can try to undo some parts of what you did later one, but if any >> involved BDS was used in the meantime by anything other than the block >> job, you don't have transactional behaviour any more. >> >> And isn't the management tool perfectly capable of cleaning up all the >> block jobs without magic happening in qemu if one of them fails? Do we >> actually need atomic failure later on? And if so, do we need atomic >> failure only of block jobs started in the same transaction? Why? >> > > There's always a debate about who is responsible for cleaning things up > on failures: QEMU or the management tool? Luckily: it's an optional > parameter, so the tool can decide at run-time about what semantics it wants. > > IMO: It's a little late in this series to question if we need this or > not, but I'll oblige. > > The original objection from Stefan to the incremental backup
> https://soundcloud.com/tothejazz/gyms-flappy-the-happy-sealtransaction > semantics was that if two incremental backup jobs launched by a LOL. I mispasted a soundcloud link in here. Well... Enjoy this stupid song that made me laugh that I was trying to link to someone else. Sigh!!! (So embarrassed.) > transaction had only partial success, the management tool would have to > take on extra state and possibly issue some corrective actions to QEMU > in order to undo the damage. > > QEMU, however, could just unravel the failure fairly trivially and allow > the backup commands to maintain a simple binary success/failure state. > This was agreed to be the better, less complicated management scenario. > > In the case that incremental backups partially complete, you'll have one > bitmap deleted and a different bitmap merged back onto the BDS. The > management tool can at this point absolutely not delete the one > incremental that got made and needs to leave it in place before > re-issuing the incremental backup. It's then responsible for either > squashing the two incremental backups it made, or otherwise managing the > disparity in the number of actual backup files. > > For QEMU's trouble, we don't delete incremental backup data until after > the backup operation, so it's trivial to hold onto the data until we > know everything's OK. > > We decided it was nicest for QEMU to just roll back all of the jobs in > this case. If the management tool disagrees, it can just roll with the > original semantics. > > > I still believe strongly that this is the right way to go. It's > flexible, allows for either party to manage the failure, the parameter > is completely non-ambiguous, provides for early failure if the expected > semantics are not possible, and the code is already here and mostly > reviewed... except for the API names. > > --js >