Have you used strace on the du command to see what it's spending its time doing?
On Thu, Feb 28, 2019, 8:45 PM Glen Baars <g...@onsitecomputers.com.au> wrote: > Hello Wido, > > The cluster layout is as follows: > > 3 x Monitor hosts ( 2 x 10Gbit bonded ) > 9 x OSD hosts ( > 2 x 10Gbit bonded, > LSI cachecade and write cache drives set to single, > All HDD in this pool, > no separate DB / WAL. With the write cache and the SSD read cache on the > LSI card it seems to perform well. > 168 OSD disks > > No major increase in OSD disk usage or CPU usage. The RBD DU process uses > 100% of a single 2.4Ghz core while running - I think that is the limiting > factor. > > I have just tried removing most of the snapshots for that volume ( from 14 > snapshots down to 1 snapshot ) and the rbd du command now takes around 2-3 > minutes. > > Kind regards, > Glen Baars > > -----Original Message----- > From: Wido den Hollander <w...@42on.com> > Sent: Thursday, 28 February 2019 5:05 PM > To: Glen Baars <g...@onsitecomputers.com.au>; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness > > > > On 2/28/19 9:41 AM, Glen Baars wrote: > > Hello Wido, > > > > I have looked at the libvirt code and there is a check to ensure that > fast-diff is enabled on the image and only then does it try to get the real > disk usage. The issue for me is that even with fast-diff enabled it takes > 25min to get the space usage for a 50TB image. > > > > I had considered turning off fast-diff on the large images to get > > around to issue but I think that will hurt my snapshot removal times ( > > untested ) > > > > Can you tell a bit more about the Ceph cluster? HDD? SSD? DB and WAL on > SSD? > > Do you see OSDs spike in CPU or Disk I/O when you do a 'rbd du' on these > images? > > Wido > > > I can't see in the code any other way of bypassing the disk usage check > but I am not that familiar with the code. > > > > ------------------- > > if (volStorageBackendRBDUseFastDiff(features)) { > > VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. " > > "Querying for actual allocation", > > def->source.name, vol->name); > > > > if (virStorageBackendRBDSetAllocation(vol, image, &info) < 0) > > goto cleanup; > > } else { > > vol->target.allocation = info.obj_size * info.num_objs; } > > ------------------------------ > > > > Kind regards, > > Glen Baars > > > > -----Original Message----- > > From: Wido den Hollander <w...@42on.com> > > Sent: Thursday, 28 February 2019 3:49 PM > > To: Glen Baars <g...@onsitecomputers.com.au>; > > ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness > > > > > > > > On 2/28/19 2:59 AM, Glen Baars wrote: > >> Hello Ceph Users, > >> > >> Has anyone found a way to improve the speed of the rbd du command on > large rbd images? I have object map and fast diff enabled - no invalid > flags on the image or it's snapshots. > >> > >> We recently upgraded our Ubuntu 16.04 KVM servers for Cloudstack to > Ubuntu 18.04. The upgrades libvirt to version 4. When libvirt 4 adds an rbd > pool it discovers all images in the pool and tries to get their disk usage. > We are seeing a 50TB image take 25min. The pool has over 300TB of images in > it and takes hours for libvirt to start. > >> > > > > This is actually a pretty bad thing imho. As a lot of images people will > be using do not have fast-diff enabled (images from the past) and that will > kill their performance. > > > > Isn't there a way to turn this off in libvirt? > > > > Wido > > > >> We can replicate the issue without libvirt by just running a rbd du on > the large images. The limiting factor is the cpu on the rbd du command, it > uses 100% of a single core. > >> > >> Our cluster is completely bluestore/mimic 13.2.4. 168 OSDs, 12 Ubuntu > 16.04 hosts. > >> > >> Kind regards, > >> Glen Baars > >> This e-mail is intended solely for the benefit of the addressee(s) and > any other named recipient. It is confidential and may contain legally > privileged or confidential information. If you are not the recipient, any > use, distribution, disclosure or copying of this e-mail is prohibited. The > confidentiality and legal privilege attached to this communication is not > waived or lost by reason of the mistaken transmission or delivery to you. > If you have received this e-mail in error, please notify us immediately. > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > This e-mail is intended solely for the benefit of the addressee(s) and > any other named recipient. It is confidential and may contain legally > privileged or confidential information. If you are not the recipient, any > use, distribution, disclosure or copying of this e-mail is prohibited. The > confidentiality and legal privilege attached to this communication is not > waived or lost by reason of the mistaken transmission or delivery to you. > If you have received this e-mail in error, please notify us immediately. > > > This e-mail is intended solely for the benefit of the addressee(s) and any > other named recipient. It is confidential and may contain legally > privileged or confidential information. If you are not the recipient, any > use, distribution, disclosure or copying of this e-mail is prohibited. The > confidentiality and legal privilege attached to this communication is not > waived or lost by reason of the mistaken transmission or delivery to you. > If you have received this e-mail in error, please notify us immediately. > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com