On 04/19/2017 04:40 PM, John Snow wrote: > > > On 04/19/2017 05:12 PM, Eric Blake wrote: >> On 04/19/2017 03:32 PM, John Snow wrote: >> >>>> @@ -279,9 +280,9 @@ static int mig_save_device_bulk(QEMUFile *f, >>>> BlkMigDevState *bmds) >>>> /* Skip unallocated sectors; intentionally treats failure as >>>> * an allocated sector */ >>>> while (cur_sector < total_sectors && >>>> - !bdrv_is_allocated(blk_bs(bb), cur_sector, >>>> - MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) { >>>> - cur_sector += nr_sectors; >>>> + !bdrv_is_allocated(blk_bs(bb), cur_sector * >>>> BDRV_SECTOR_SIZE, >>>> + MAX_IS_ALLOCATED_SEARCH, &count)) { >>>> + cur_sector += DIV_ROUND_UP(count, BDRV_SECTOR_SIZE); >>>> } >>> >>> Let's try asking again with the right vocabulary: >>> >>> OK, so I guess the idea is that we definitely shouldn't ever have a >>> partially allocated sector; so I suppose this ROUND_UP is just a precaution? >> >> For now, yes - but the part 3 series (bdrv_get_block_status) makes >> returning a sub-sector value possible. For most drivers, it will only >> happen at end-of-file (as status doesn't change mid-sector unless your >> file is not a multiple of a sector size) - in fact, I found that I had >> to patch the block layer to round up and sub-sector return that >> coincides with end-of-file back to a round sector amount in order to >> avoid breaking other code that assumed everything is allocated in sectors. > > I'm sorry, I'm having difficulty parsing your last sentence.
We should NEVER have an allocation smaller than bds->bl.request_alignment (for example, if you can't write in chunks smaller than 512, then how can your allocation be smaller than that?). But when we eventually have byte-granularity support, and for a device where request_alignment is 1, it is feasible that we could track mid-sector alignment changes. In practice, we DO have an easy-to-observe mid-sector alignment change: on a regular file not aligned to sector boundaries, where request_alignment IS 1, we get a mid-sector transition from data to hole at EOF. So it turned out to be easier in my later series on bdrv_get_block_status() to intentionally round a partial sector at end-of-file up as if the entire sector was allocated, rather than reporting the mid-sector alignment (matching the fact that bdrv_getlength() rounds up to sector boundaries). If, someday, we fix bdrv_getlength() to report a non-sector-aligned value, we'll have to revisit the rounding (again, it feels like bds->bl.request_alignment is the obvious place to ensure that we never report a mid-alignment result, but where the alignment size is now driver-dependent instead of hard-coded to 512). In fact, even with non-sector-aligned bdrv_getlength(), we may STILL be able to enforce rules such as mid-sector transitions from hole->data are not possible (regardless of request_alignment), and mid-sector transitions from data->hole are possible only at the end of the file. > I agree with you that there is absolutely definitely no real problem > here right now, but I do secretly wonder if we'll invent a way to screw > ourselves over. > > Presumably it will get converted at SOME point like everything else, and > it won't be a concern anymore. > > Presumably Presumably that will even happen before we invent a way to > screw ourselves over. Here's hoping we are careful enough, and have enough asserts in the right place so that if we guessed wrong we get a nice assert (rather than risking inf-looping because we rounded a sub-sector down to 0, or causing data corruption that doesn't show up until much later where we rounded up and missed copying something); at the same time, that the asserts are accurate enough that we can't trip them from legitimate users. One thing I plan to add to my v2 posting (for the part 3 bdrv_get_block_status series) is to remove the 'qemu-io alloc' sector alignment restrictions, to have a tool that makes it easier to prove that I am properly handling rounding at the block layer out to request_alignment limits without triggering any asserts. And my recent blkdebug enhancements are probably going to be helpful in validating things, even if I have to add more iotests. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature