29.10.2019 15:11, Max Reitz wrote: > On 29.10.19 13:05, Vladimir Sementsov-Ogievskiy wrote: >> 29.10.2019 14:55, Max Reitz wrote: >>> On 29.10.19 12:48, Vladimir Sementsov-Ogievskiy wrote: >>>> 29.10.2019 11:50, Max Reitz wrote: >>>>> On 28.10.19 12:25, Vladimir Sementsov-Ogievskiy wrote: >>>>>> 28.10.2019 14:04, Kevin Wolf wrote: >>>>>>> Am 27.10.2019 um 13:35 hat Stefan Hajnoczi geschrieben: >>>>>>>> On Fri, Oct 25, 2019 at 11:58:46AM +0200, Max Reitz wrote: >>>>> >>>>> [...] >>>>> >>>>>>>>> (3) Drop handle_alloc_space(), i.e. revert c8bb23cbdbe32f. >>>>>>>>> To my knowledge I’m the only one who has provided any >>>>>>>>> benchmarks for >>>>>>>>> this commit, and even then I was a bit skeptical because it >>>>>>>>> performs >>>>>>>>> well in some cases and bad in others. I concluded that it’s >>>>>>>>> probably worth it because the “some cases” are more likely to >>>>>>>>> occur. >>>>>>>>> >>>>>>>>> Now we have this problem of corruption here (granted due to a >>>>>>>>> bug in >>>>>>>>> the XFS driver), and another report of massively degraded >>>>>>>>> performance on ppc64 >>>>>>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1745823 – sorry, >>>>>>>>> a >>>>>>>>> private BZ; I hate that :-/ The report is about 40 % worse >>>>>>>>> performance for an in-guest fio write benchmark.) >>>>>>>>> >>>>>>>>> So I have to ask the question about what the justification for >>>>>>>>> keeping c8bb23cbdbe32f is. How much does performance >>>>>>>>> increase with >>>>>>>>> it actually? (On non-(ppc64+XFS) machines, obviously) >>>>>>>>> >>>>>>>>> Advantages: >>>>>>>>> + Trivial >>>>>>>>> + No layering violations >>>>>>>>> + We wouldn’t need to keep track of whether the kernel bug >>>>>>>>> has been >>>>>>>>> fixed or not >>>>>>>>> + Fixes the ppc64+XFS performance problem >>>>>>>>> >>>>>>>>> Disadvantages: >>>>>>>>> - Reverts cluster allocation performance to pre-c8bb23cbdbe32f >>>>>>>>> levels, whatever that means >>>>>>>> >>>>>>>> My favorite because it is clean and simple, but Vladimir has a valid >>>>>>>> use-case for requiring this performance optimization so reverting isn't >>>>>>>> an option. >>>>>>> >>>>>>> Vladimir also said that qcow2 subclusters would probably also solve his >>>>>>> problem, so maybe reverting and applying the subcluster patches instead >>>>>>> is a possible solution, too? >>>>>> >>>>>> I'm not sure about ssd case, it may need write-zero optimization anyway. >>>>> >>>>> What exactly do you need? Do you actually need to write zeroes (e.g. >>>>> because you’re storing images on block devices) or would it be >>>>> sufficient to just drop the COW areas when bdrv_has_zero_init() and >>>>> bdrv_has_zero_init_truncate() are true? >>>> >>>> Hmm, what do you mean? We need to zero COW areas. So, original way is to >>>> write real zeroes, optimized way is fallocate.. What do you mean by drop? >>>> Mark sublusters as zeroes by metadata? >>> >>> Why do you need to zero COW areas? For normal files, any data will read >>> as zero if you didn’t write anything there. >> >> Hmm, but when allocating new cluster in qcow2, it's not guaranteed to be >> zero, >> as it may be reused previously allocated cluster.. > > Hm, right. We could special-case something for beyond the EOF, but I > don’t know whether that really makes it better. > > OTOH, maybe allocations at the EOF are the real bottleneck. Reusing > existing clusters should be rare enough that maybe the existing code > which explicitly writes zeroes is sufficient.
But, as I understand pre-EOF fallocates are safe in xfs? So, we may just drop calling fallocate past-EOF (it's zero anyway) and do fallocate pre-EOF (it's safe) ? (the only special case is intersecting EOF.. so better is keep file length at cluster-size boundary, truncating on exit) > >>> >>>> But still we'll have COW areas in subcluster, and we'll need to directly >>>> zero >>>> them.. And fallocate will most probably be faster on ssd ext4 case.. >>>> >>>>> >>>>> I’m asking because Dave Chinner said >>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1765547#c7) that >>>>> fallocate() is always slow at least with aio=native because it needs to >>>>> wait for all concurrent AIO writes to finish, and so it causes the AIO >>>>> pipeline to stall. >>>>> >>>>> (He suggested using XFS extent size hints to get the same effect as >>>>> write-zeroes for free, basically, but I don’t know whether that’s really >>>>> useful to us; as unallocated areas on XFS read back as zero anyway.) >>>>> >>>>>>> We already have some cases where the existing handle_alloc_space() >>>>>>> causes performance to actually become worse, and serialising requests as >>>>>>> a workaround isn't going to make performance any better. So even on >>>>>>> these grounds, keeping commit c8bb23cbdbe32f is questionable. >>>>>>> >>>>>> >>>>>> Can keeping handle_alloc_space under some config option be an option? >>>>> >>>>> Hm. A config option is weird if you’re the only one who’s going to >>>>> enable it. But other than that I don’t have anything against it. >>>>> >>>> >>>> It's just a bit easier for us to maintain config option, than out-of-tree >>>> patch. >>>> On the other hand, it's not a real problem to maintain this one patch in >>>> separate. >>>> It may return again to the tree, when XFS bug fixed. >>> >>> We’ll still have the problem that fallocate() must stall aio=native >>> requests. >>> >> >> Does it mean that fallocate is bad in general? Practice shows the opposite.. >> At least I have my examples with qemu-img bench. Can that thing be shown with >> qemu-img bench or something? > > I haven’t done benchmarks yet, but I don’t think Laurent will mind if I > paste his fio benchmark results from the unfortunately private BZ here: > > QCOW2 ON | EXT4 | XFS | > | | | > (rw, bs, iodepth) | 2.12.0 | 4.1.0 | 2.12.0 | 4.1.0 | > ------------------+-----------+-----------+-----------+-----------+ > (write, 16k, 1) | 1868KiB/s | 1865KiB/s | 18.6MiB/s | 13.7MiB/s | > ------------------+-----------+-----------+-----------+-----------+ > (write, 64k, 1) | 7036KiB/s | 7413KiB/s | 27.0MiB/s | 7465KiB/s | > ------------------+-----------+-----------+-----------+-----------+ > (write, 4k, 8) | 535KiB/s | 524KiB/s | 13.9MiB/s | 1662KiB/s | > ------------------+-----------+-----------+-----------+-----------+ > > And that just looks like ext4 performs worse with aio=native in general. > But I’ll have to do my own benchmarks first. > > Max > -- Best regards, Vladimir
