Have you used strace on the du command to see what it's spending its time
doing?

On Thu, Feb 28, 2019, 8:45 PM Glen Baars <g...@onsitecomputers.com.au>
wrote:

> Hello Wido,
>
> The cluster layout is as follows:
>
> 3 x Monitor hosts ( 2 x 10Gbit bonded )
> 9 x OSD hosts (
> 2 x 10Gbit bonded,
> LSI cachecade and write cache drives set to single,
> All HDD in this pool,
> no separate DB / WAL. With the write cache and the SSD read cache on the
> LSI card it seems to perform well.
> 168 OSD disks
>
> No major increase in OSD disk usage or CPU usage. The RBD DU process uses
> 100% of a single 2.4Ghz core while running - I think that is the limiting
> factor.
>
> I have just tried removing most of the snapshots for that volume ( from 14
> snapshots down to 1 snapshot ) and the rbd du command now takes around 2-3
> minutes.
>
> Kind regards,
> Glen Baars
>
> -----Original Message-----
> From: Wido den Hollander <w...@42on.com>
> Sent: Thursday, 28 February 2019 5:05 PM
> To: Glen Baars <g...@onsitecomputers.com.au>; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness
>
>
>
> On 2/28/19 9:41 AM, Glen Baars wrote:
> > Hello Wido,
> >
> > I have looked at the libvirt code and there is a check to ensure that
> fast-diff is enabled on the image and only then does it try to get the real
> disk usage. The issue for me is that even with fast-diff enabled it takes
> 25min to get the space usage for a 50TB image.
> >
> > I had considered turning off fast-diff on the large images to get
> > around to issue but I think that will hurt my snapshot removal times (
> > untested )
> >
>
> Can you tell a bit more about the Ceph cluster? HDD? SSD? DB and WAL on
> SSD?
>
> Do you see OSDs spike in CPU or Disk I/O when you do a 'rbd du' on these
> images?
>
> Wido
>
> > I can't see in the code any other way of bypassing the disk usage check
> but I am not that familiar with the code.
> >
> > -------------------
> >     if (volStorageBackendRBDUseFastDiff(features)) {
> >         VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. "
> >                   "Querying for actual allocation",
> >                   def->source.name, vol->name);
> >
> >         if (virStorageBackendRBDSetAllocation(vol, image, &info) < 0)
> >             goto cleanup;
> >     } else {
> >         vol->target.allocation = info.obj_size * info.num_objs; }
> > ------------------------------
> >
> > Kind regards,
> > Glen Baars
> >
> > -----Original Message-----
> > From: Wido den Hollander <w...@42on.com>
> > Sent: Thursday, 28 February 2019 3:49 PM
> > To: Glen Baars <g...@onsitecomputers.com.au>;
> > ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness
> >
> >
> >
> > On 2/28/19 2:59 AM, Glen Baars wrote:
> >> Hello Ceph Users,
> >>
> >> Has anyone found a way to improve the speed of the rbd du command on
> large rbd images? I have object map and fast diff enabled - no invalid
> flags on the image or it's snapshots.
> >>
> >> We recently upgraded our Ubuntu 16.04 KVM servers for Cloudstack to
> Ubuntu 18.04. The upgrades libvirt to version 4. When libvirt 4 adds an rbd
> pool it discovers all images in the pool and tries to get their disk usage.
> We are seeing a 50TB image take 25min. The pool has over 300TB of images in
> it and takes hours for libvirt to start.
> >>
> >
> > This is actually a pretty bad thing imho. As a lot of images people will
> be using do not have fast-diff enabled (images from the past) and that will
> kill their performance.
> >
> > Isn't there a way to turn this off in libvirt?
> >
> > Wido
> >
> >> We can replicate the issue without libvirt by just running a rbd du on
> the large images. The limiting factor is the cpu on the rbd du command, it
> uses 100% of a single core.
> >>
> >> Our cluster is completely bluestore/mimic 13.2.4. 168 OSDs, 12 Ubuntu
> 16.04 hosts.
> >>
> >> Kind regards,
> >> Glen Baars
> >> This e-mail is intended solely for the benefit of the addressee(s) and
> any other named recipient. It is confidential and may contain legally
> privileged or confidential information. If you are not the recipient, any
> use, distribution, disclosure or copying of this e-mail is prohibited. The
> confidentiality and legal privilege attached to this communication is not
> waived or lost by reason of the mistaken transmission or delivery to you.
> If you have received this e-mail in error, please notify us immediately.
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> > This e-mail is intended solely for the benefit of the addressee(s) and
> any other named recipient. It is confidential and may contain legally
> privileged or confidential information. If you are not the recipient, any
> use, distribution, disclosure or copying of this e-mail is prohibited. The
> confidentiality and legal privilege attached to this communication is not
> waived or lost by reason of the mistaken transmission or delivery to you.
> If you have received this e-mail in error, please notify us immediately.
> >
> This e-mail is intended solely for the benefit of the addressee(s) and any
> other named recipient. It is confidential and may contain legally
> privileged or confidential information. If you are not the recipient, any
> use, distribution, disclosure or copying of this e-mail is prohibited. The
> confidentiality and legal privilege attached to this communication is not
> waived or lost by reason of the mistaken transmission or delivery to you.
> If you have received this e-mail in error, please notify us immediately.
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to