Re: [Qemu-devel] [PATCH v2 7/8] block: Move some bdrv_*_all() functions to BB

Kevin Wolf Mon, 07 Dec 2015 03:18:14 -0800

Am 05.12.2015 um 00:15 hat Max Reitz geschrieben:
> On 01.12.2015 17:01, Kevin Wolf wrote:
> > Am 10.11.2015 um 04:27 hat Max Reitz geschrieben:
> >> Move bdrv_drain_all(), bdrv_commit_all(), bdrv_flush_all() and
> >> bdrv_invalidate_cache_all() to BB.
> >>
> >> The only operation left is bdrv_close_all(), which cannot be moved to
> >> the BB because it should not only close all BBs, but also all
> >> monitor-owned BDSs.
> >>
> >> Signed-off-by: Max Reitz <mre...@redhat.com>
> > 
> > bdrv_commit_all() and bdrv_flush_all() are relatively obvious. They are
> > meant to commit/flush changes made by a guest, so moving to the BB level
> > seems right.
> 
> OK, I wanted to just follow this comment, but since monitor_bdrv_states
> is local to blockdev.c (and I consider that correct) that would mean
> either making it global (which I wouldn't like) or keeping bdrv_states
> (which I wouldn't like either, not least because dropping bdrv_states
> allows us to drop bdrv_new_root(), which may or may not be crucial to
> the follow-up series to this one).
> 
> So, I'll be trying to defend what this patch does for now.
> 
> > I'm not so sure about bdrv_invalidate_cache_all(). Even if a BB isn't
> > attached to a BDS, we still need to invalidate the caches of any images
> > that have been opened before migration has completed, including those
> > images that are only owned by the monitor.
> 
> I consider BB-less BDSs to be in a temporary state between blockdev-add
> and binding them to a device, or between unbinding them from a device
> and blockdev-del. So I don't even know if we should allow them to be
> migrated.


If we don't want to, we should add a migration blocker while such a BDS
exists.

I don't think that would be right, though. Definitely not in the strict
way you phrased it ("binding them to a device"), but probably also not
if you interpret "device" as any kind of user, including block jobs or
NBD servers; actually, I think even the monitor would belong in this
list, but then you always have "something" attached, otherwise the
refcount becomes zero and the BDS is deleted.

If you don't include the monitor, though, you prevent entirely
reasonable use cases like opening upfront multiple images for a device
with removable media and then changing between them later on, which
leaves some BDSes unattached while another medium is inserted.

I think BDSes without a BB attached should still be first-class
citizens.

> Anyway, in my opinion, a BB-less BDS should be totally static.
> Invalidating its cache shouldn't do anything. Instead, its cache should
> be invalidated when it is detached from a BB (just like this patch adds
> a bdrv_drain() call to blk_remove_bs()).

Are you sure you remember what bdrv_invalidate_cache() is for? Its job
is dropping stale cache contents after a mere bdrv_open() when migrating
with shared storage. It is wrong to ever use this after actually using
(instead of just opening) an image.

The order is like this:

1. Start destination qemu -incoming. This opens the image and
   potentially reads in some metadata (qcow2 L1 table, etc.)
2. Source keeps writing to the image.
3. Migration completes. Source flushes the image and stops writing.
4. Destination must call bdrv_invalidate_cache() to make sure it gets
   all the updates from step 2 instead of using stale data from step 1.

Note how there is no detaching from a BB involved at all.

> > Similarly I'm concerned about bdrv_drain_all(). Anything outside the
> > block layer should certainly call blk_drain_all() because it can only be
> > interested in those images that it can see. The calls in block.c,
> > however, look like they should consider BDSes without a BB as well. (The
> > monitor, which can hold direct BDS references, may be another valid user
> > of a bdrv_drain_all that also covers BDSes without a BB.)
> 
> Similarly, in my opinion, bdrv_drain() shouldn't do anything on a
> BB-less BDS. That is why this patch adds a bdrv_drain() call to
> blk_remove_bs() (a bdrv_invalidate_cache() call is missing yet, though).
> 
> But I just remembered that block jobs don't use BBs... Did I mention
> before how wrong I think that is?
> 
> Now I'm torn between trying to make block jobs use BBs once more
> (because I really think BB-less BDSs should not have any active I/O) and
> doing some ugly things with bdrv_states and/or bdrv_monitor_states. I'll
> have to think about it.

I think we first need to allow multiple BBs per BDS, so that block jobs
can create their own BB.

And anyway, this seems to be a great start for discussing our whole BB
model in person on Friday. As I said, I think what we're currently doing
is wrong. I am concerned that introducing a model that actually makes
sense is hard to do in a compatible way, but I'm getting less and less
confident that doing it like we do now is the right conclusion from that.

> > And block.c calling blk_*() functions smells a bit fishy anyway.
> 
> I don't find it any more fishy than single-BDS functions calling
> bdrv_*_all() functions. And that is, not fishy at all. If some function
> wants to drain all I/O and we define I/O to always originate from a BB,
> then I find the only reasonable approach to call blk_drain_all().

bdrv_*_all() calls are always fishy, at least if done outside shutdown.

Kevin

pgpVKYVqh7avz.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 7/8] block: Move some bdrv_*_all() functions to BB

Reply via email to