Am 26.05.2015 um 16:24 hat Max Reitz geschrieben: > On 26.05.2015 16:22, Kevin Wolf wrote: > >Am 21.05.2015 um 08:42 hat Fam Zheng geschrieben: > >>It blocks device IO. > >What I'm missing here is a description of the case that it protects > >against. Blocking device I/O doesn't sound like a valid thing to want > >per se, but it might be true that we need it as long as we don't have > >the "real" op blockers that Jeff is working on. However, as long as you > >don't say why you want that, I can't say whether it's true or not. > > > >And of course, it doesn't block anything yet after this patch, it's just > >set or cleared without any effect. In fact, it seems that even at the > >end of the series, there is no bdrv_op_is_blocked() call that checks for > >BLOCK_OP_TYPE_DEVICE_IO. Is this intentional? > > Probably yes, because patches 2 and 3 introduce a notifier for when > op blockers are set up and removed. This is used by virtio-blk and > virtio-scsi to pause and unpause device I/O, respectively.
Indeed, I missed that. Having thought a bit more about this, I think we need to carefully check whether op blockers are really the right tool here; and if they are, we need to document the reason in the commit message at least. The op blocker model as used until now has a few characteristics: * Old model: You have a period of time between blocking and unblocking during which starting the corresponding operation is prohibited. The operation checks whether the blocker is set when it starts, and it fails if the operation is blocked. New model that Jeff is working on: You still have block/unblock (for a category rather than a specific operation, though), but you also have a start/stop pair for using the functionality. If you start an operation that is blocked, it still errors out. However, blocking an operation that is already started fails as well now. This series: An already running operation is interrupted when the blocker is set, and it continues when it is removed. You had to introduce new infrastructure (notifiers) because this doesn't match the op blockers model. * Op blockers used to be set to protect high-level, long-running background operations, that are usually initiated by the user. bdrv_drain() is low-level and a blocking foreground operation. * Op blockers came into effect only after monitor interactions and used to be on a slow path. We're now in the I/O path. It's not a problem with the current code per se (as you introduced notifiers, so we don't have to iterate the list all the time), but it broadens the scope of op blockers significantly and we shouldn't be doing that carelessly. * Op blockers have an error message. It's unused for the new case. This is the first part of what's troubling me with this series, as it makes me doubtful if op blockers are the right tool to implement what the commit message says (block device I/O). This is "are we doing the thing right?" The second part should actually come first, though: "Are we doing the right thing?" I'm also doubtful whether blocking device I/O is even what we should do. Why is device I/O special compared to block job I/O or NBD server I/O? If the answer is "because block jobs are already paused while draining" (and probably nobody thought much about the NBD server), then chances are that it's not the right thing. In fact, using two different mechanisms for pausing block jobs and pausing device I/O seems inconsistent and wrong. I suspect that the real solution needs to be in the block layer, though I'm not quite sure yet what it would be like. Perhaps a function pair like blk_stop/resume_io() that is used by bdrv_drain() callers and works on the BlockBackend level. (The BDS level is problematic because we don't want to stop all I/O; many callers just want to drain I/O of _other_ callers so they can do their own I/O without having to care about concurrency). Kevin