Hi again, I'm not sure the html mail made it to the lists -- resending in plain text. I've also opened https://tracker.ceph.com/issues/56488
Cheers, Dan On Wed, Jul 6, 2022 at 11:43 PM Dan van der Ster <dvand...@gmail.com> wrote: > > Hi Igor and others, > > (apologies for html, but i want to share a plot ;) ) > > We're upgrading clusters to v16.2.9 from v15.2.16, and our simple "rados > bench -p test 10 write -b 4096 -t 1" latency probe showed something is very > wrong with deferred writes in pacific. > Here is an example cluster, upgraded today: > > > > The OSDs are 12TB HDDs, formatted in nautilus with the default > bluestore_min_alloc_size_hdd = 64kB, and each have a large flash block.db. > > I found that the performance issue is because 4kB writes are no longer > deferred from those pre-pacific hdds to flash in pacific with the default > config !!! > Here are example bench writes from both releases: > https://pastebin.com/raw/m0yL1H9Z > > I worked out that the issue is fixed if I set > bluestore_prefer_deferred_size_hdd = 128k (up from the 64k pacific default. > Note the default was 32k in octopus). > > I think this is related to the fixes in https://tracker.ceph.com/issues/52089 > which landed in 16.2.6 -- _do_alloc_write is comparing the prealloc size > 0x10000 with bluestore_prefer_deferred_size_hdd (0x10000) and the "strictly > less than" condition prevents deferred writes from ever happening. > > So I think this would impact anyone upgrading clusters with hdd/ssd mixed > osds ... surely we must not be the only clusters impacted by this?! > > Should we increase the default bluestore_prefer_deferred_size_hdd up to 128kB > or is there in fact a bug here? > > Best Regards, > > Dan > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io