Il 14/03/2012 13:37, Kevin Wolf ha scritto: > Am 14.03.2012 13:14, schrieb Paolo Bonzini: >>> Paolo mentioned a use case as a fast way for guests to write zeros, but >>> is it really faster than a normal write when we have to emulate it by a >>> bdrv_write with a temporary buffer of zeros? >> >> No, of course not. >> >>> On the other hand we have >>> the cases where discard really means "I don't care about the data any >>> more" and emulating it by writing zeros is just a waste of resources there. >>> >>> So I think we only want to advertise that discard zeroes data if we can >>> do it efficiently. This means that the format does support it, and that >>> the device is able to communicate the discard granularity (= cluster >>> size) to the guest OS. >> >> Note that the discard granularity is only a hint, so it's really more a >> maximum suggested value than a granularity. Outside of a cluster >> boundary the format would still have to write zeros manually. > > You're talking about SCSI here, I guess? Would be one case where being > able to define sane semantics for virtio-blk would have been an > advantage... I had hoped that SCSI was already sane, but if doesn't > distinguish between "I don't care about this any more" and "I want to > have zeros here", then I'm afraid I can't call it sane any more.
It does make the distinction. "I don't care" is UNMAP (or WRITE SAME(16) with the UNMAP bit set); "I want to have zeroes" is WRITE SAME(10) or WRITE SAME(16) with an all-zero payload. > We can make the conditions even stricter, i.e. allow it only if protocol > can pass through discards for unaligned requests. This wouldn't free > clusters on an image format level, but at least on a file system level. > >> Also, Linux for example will only round the number of sectors down to >> the granularity, not the start sector. Rereading the code, for SCSI we >> want to advertise a zero granularity (aka do whatever you want), >> otherwise we may get only misaligned discard requests and end up writing >> zeroes inefficiently all the time. > > Does this make sense with real hardware or is it a Linux bug? It's a bug, SCSI defines the "optimal unmap request starting LBA" to be "(n × optimal unmap granularity) + unmap granularity alignment". >> The problem is that advertising discard_zeroes_data based on the backend >> calls for trouble as soon as you migrate between storage formats, >> filesystems or disks. > > True. You would have to emulate if you migrate from a source that can > discard to zeros efficiently to a destination that can't. > > In the end, I guess we'll just have to accept that we can't fix bad > semantics of ATA and SCSI, and just need to decide whether "I don't > care" or "I want to have zeros" is more common. My feeling is that "I > don't care" is the more useful operation because it can't be expressed > otherwise, but I haven't checked what guests really do. Yeah, guests right now only use it for unused filesystem pieces, so the "do not care" semantics are fine. I also hoped to use discard to avoid blowing up thin-provisioned images when streaming. Perhaps we can use bdrv_has_zero_init instead, and/or pass down the copy-on-read flag to the block driver. Anyhow, there are some patches from this series that are relatively independent and ready for inclusion, I'll extract them and post them separately. Paolo