Am 27.12.2014 um 18:42 schrieb Denis V. Lunev: > On 27/12/14 17:52, Peter Lieven wrote: >> Am 26.12.2014 um 20:15 schrieb Denis V. Lunev: >>> On 26/12/14 16:32, Denis V. Lunev wrote: >>>> On 26/12/14 16:13, Peter Lieven wrote: >>>>> Am 26.12.2014 um 13:35 schrieb Denis V. Lunev: >>>>>> The check for maximum length was added by >>>>>> commit c31cb70728d2c0c8900b35a66784baa446fd5147 >>>>>> Author: Peter Lieven <p...@kamp.de> >>>>>> Date: Thu Oct 24 12:06:58 2013 +0200 >>>>>> block: honour BlockLimits in bdrv_co_do_write_zeroes >>>>>> >>>>>> but actually if driver provides .bdrv_co_write_zeroes callback, there is >>>>>> no need to limit the size to 32 MB. Callback should provide effective >>>>>> implementation which normally should not write any zeroes in comparable >>>>>> amount. >>>>> >>>>> NACK. >>>>> >>>>> First there is no guarantee that bdrv_co_do_write_zeroes is a fast >>>>> operation. >>>>> This heaviliy depends on several circumstances that the block layer is >>>>> not aware of. >>>>> If a specific protocol knows it is very fast in writing zeroes under any >>>>> circumstance >>>>> it should provide INT_MAX in bs->bl.max_write_zeroes. It is then still >>>>> allowed to >>>>> return -ENOTSUP if the request size or alignment doesn't fit. >>>> >>>> the idea is that (from my point of view) if .bdrv_co_do_write_zeroes is >>>> specified, the cost is almost the same for any amount of zeroes >>>> written. This is true for fallocate from my point of view. The amount >>>> of actually written data will be in several orders less than specified >>>> except slow path, which honors 32 MB limit. >>>> >>>> If the operation is complex in realization, then it will be rate-limited >>>> below, in actual implementation. >>>> >>>>> There are known backends e.g. anything that deals with SCSI that have a >>>>> known >>>>> limitation of the maximum number of zeroes they can write fast in a >>>>> single request. >>>>> This number MUST NOT be exceeded. The below patch would break all those >>>>> backends. >>>> >>>> could you pls point me this backends. Actually, from my point of >>>> view, they should properly setup max_write_zeroes for themselves. >>>> This is done at least in block/iscsi.c and it would be consistent >>>> way of doing so. >>>> >>>>> >>>>> What issue are you trying to fix with this patch? Maybe there is a better >>>>> way to fix >>>>> it at another point in the code. >>>>> >>>> >>>> I am trying to minimize amount of metadata updates for a file. >>>> This provides some speedup even on ext4 and this will provide >>>> even more speedup with a distributed filesystem like CEPH >>>> where size updates of the files and block allocation are >>>> costly. >>>> >>>> Regards, >>>> Den >>> First of all, the patch is really wrong :) It was written using >>> wrong assumptions. >>> >>> OK. I have spent some time reading your original patchset and >>> and did not found any useful justification for default limit >>> for both discard and write zero. >> >> 32768 is the largest power of two fitting into a uint16. >> And uint16 is quite common for nb_sectors in backends. >> > > ok. This could be reasonable. > > >>> >>> Yes, there are drivers which requires block level to call >>> .bdrv_co_do_write_zeroes with alignment and with upper limit. >>> But in this case driver setups max_write_zeroes. All buggy >>> drivers should do that not to affect not buggy ones from >>> my opinion. >>> >>> This is the only purpose of the original patches for limits. >>> I have wrongly interpret BlockLimits as something connected >>> with time of the operation. Sorry for that. >>> >>> Therefore there is no good reason for limiting the amount of >>> data sent to drv->bdrv_co_writev with any data size. The only >>> thing is that it would be good not to allocate too many memory >>> at once. We could do something like >>> >>> base = qemu_try_blockalign(bs, MIN(2048, num) * BDRV_SECTOR_SIZE); >>> added = 0; >>> for (added = 0; added < num; add += MIN(2048, num)) { >>> qemu_iovec_add(qiov, base, MIN(2048, num)); >>> } >>> >>> to avoid really big allocations here even if .max_write_zeroes is >>> very high. Do you think that this might be useful? >>> >>> As for .bdrv_co_do_write_zeroes itself, can we still drop >>> default 32 Mb limit? If there are some buggy drivers, they >>> should have .max_write_zeroes specified. >>> >>> The same applies to .max_discard >> >> Its always risky to change default behaviour. In the original discussion we >> agreed that there should be a limit for each request. I think the 2^15 was >> Paolos suggestion. >> >> You where talking of metadata updates for a file. So the operation that is >> too slow >> for you is bdrv_write_zeroes inside a container file? What is the >> underlaying filesystem? >> What is the exact operation that you try to optimize? >> >> I am wondering because as far as I can see write zeroes is only supported for >> XFS and block devices which support BLKZEROOUT. The latter only works for >> cache=none. So its not that easy to end up in an optimized (fast) path >> anyway. >> >> Peter >> >> > you have missed 6 patches below ;) f.e. patch 2/7 > > OK. I'll redo changes and fix on raw-posix level.
I was not in CC on the series. Can you please include me in CC for all patches when you respin. Thanks, Peter