On Mon, Mar 11, 2013 at 03:00:38PM +0000, Dietmar Maurer wrote: > > > I can see space reduction up to 4% using this feature. Considering the > > > fact that it comes at no cost, it would be stupid to remove it. > > > > Okay, looks like it's useful but not a huge win. > > > > In the NBD approach pipelining writes (or discards) ought to make 4 KB zero > > blocks usable without overhead. > > But you need to re-asemble 64KB block from that - I guess that is the > difficult part here?
That's not hard. It can be added to the RFC code I posted. Here is an outline. Say there is a 64 KB block with 2 zero regions: 0123456789abcdef XXXXzzXXXXXzXXXX X - populated 4 KB block z - zero 4 KB block Send 3 writes: NBD_CMD_WRITE from=0 length=16 KB ... NBD_CMD_WRITE from=24 KB length=20 KB ... NBD_CMD_WRITE from=48 KB length=16 KB ... The NBD server can pack this back into an extent's blockinfo: def add_cluster(cluster, blockinfo, blocks): ...the code to add a cluster to the current extent... def prepare_for_blockinfo_update(cluster): # We're still filling in the same cluster, all is fine if cur_cluster == cluster: return # Flush out last cluster, which is now complete add_cluster(cur_cluster, cur_blockinfo, cur_blocks) # Prepare to work on a new cluster cur_cluster = cluster cur_blockinfo = dev_id << 32 | cluster cur_blocks = [] def write(from, size, data): cluster = from / VMA_CLUSTER_SIZE prepare_for_blockinfo_update(cluster) cur_blockinfo |= build_blockinfo_mask(from, size) # bit manipulation cur_blocks.append(data) In other words, the NBD server assembles the writes into a blockinfo and list of blocks. Stefan