On Mon, Aug 1, 2016 at 11:03 PM, Vladislav Bolkhovitin <v...@vlnb.net> wrote: > Alex Gorbachev wrote on 08/01/2016 04:05 PM: >> Hi Ilya, >> >> On Mon, Aug 1, 2016 at 3:07 PM, Ilya Dryomov <idryo...@gmail.com> wrote: >>> On Mon, Aug 1, 2016 at 7:55 PM, Alex Gorbachev <a...@iss-integration.com> >>> wrote: >>>> RBD illustration showing RBD ignoring discard until a certain >>>> threshold - why is that? This behavior is unfortunately incompatible >>>> with ESXi discard (UNMAP) behavior. >>>> >>>> Is there a way to lower the discard sensitivity on RBD devices? >>>> >> <snip> >>>> >>>> root@e1:/var/log# blkdiscard -o 0 -l 4096000 /dev/rbd28 >>>> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END { >>>> print SUM/1024 " KB" }' >>>> 819200 KB >>>> >>>> root@e1:/var/log# blkdiscard -o 0 -l 40960000 /dev/rbd28 >>>> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END { >>>> print SUM/1024 " KB" }' >>>> 782336 KB >>> >>> Think about it in terms of underlying RADOS objects (4M by default). >>> There are three cases: >>> >>> discard range | command >>> ----------------------------------------- >>> whole object | delete >>> object's tail | truncate >>> object's head | zero >>> >>> Obviously, only delete and truncate free up space. In all of your >>> examples, except the last one, you are attempting to discard the head >>> of the (first) object. >>> >>> You can free up as little as a sector, as long as it's the tail: >>> >>> Offset Length Type >>> 0 4194304 data >>> >>> # blkdiscard -o $(((4 << 20) - 512)) -l 512 /dev/rbd28 >>> >>> Offset Length Type >>> 0 4193792 data >> >> Looks like ESXi is sending in each discard/unmap with the fixed >> granularity of 8192 sectors, which is passed verbatim by SCST. There >> is a slight reduction in size via rbd diff method, but now I >> understand that actual truncate only takes effect when the discard >> happens to clip the tail of an image. >> >> So far looking at >> https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2057513 >> >> ...the only variable we can control is the count of 8192-sector chunks >> and not their size. Which means that most of the ESXi discard >> commands will be disregarded by Ceph. >> >> Vlad, is 8192 sectors coming from ESXi, as in the debug: >> >> Aug 1 19:01:36 e1 kernel: [168220.570332] Discarding (start_sector >> 1342099456, nr_sects 8192) > > Yes, correct. However, to make sure that VMware is not (erroneously) enforced > to do this, you need to perform one more check. > > 1. Run cat /sys/block/rbd28/queue/discard*. Ceph should report here correct > granularity and alignment (4M, I guess?)
This seems to reflect the granularity (4194304), which matches the 8192 pages (8192 x 512 = 4194304). However, there is no alignment value. Can discard_alignment be specified with RBD? > > 2. Connect to the this iSCSI device from a Linux box and run sg_inq -p 0xB0 > /dev/<device> > > SCST should correctly report those values for unmap parameters (in blocks). > > If in both cases you see correct the same values, then this is VMware issue, > because it is ignoring what it is told to do (generate appropriately sized > and aligned UNMAP requests). If either Ceph, or SCST doesn't show correct > numbers, then the broken party should be fixed. > > Vlad > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com