Re: [ceph-users] KVM problems when rebalance occurs

Josef Johansson Wed, 06 Jan 2016 12:18:18 -0800

Hi,

Also make sure that you optimize the debug log config. There's a lot on the
ML on how to set them all to low values (0/0).


Not sure how it's in infernalis but it did a lot in previous versions.

Regards,
Josef
On 6 Jan 2016 18:16, "Robert LeBlanc" <rob...@leblancnet.us> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> There has been a lot of "discussion" about osd_backfill_scan[min,max]
> lately. My experience with hammer has been opposite that of what
> people have said before. Increasing those values for us has reduced
> the load of recovery and has prevented a lot of the disruption seen in
> our cluster caused by backfilling. It does increase the amount of time
> to do the recovery (a new node added to the cluster took about 3-4
> hours before, now takes about 24 hours).
>
> We are currently using these values and seem to work well for us.
> osd_max_backfills = 1
> osd_backfill_scan_min = 16
> osd_recovery_max_active = 1
> osd_backfill_scan_max = 32
>
> I would be interested in your results if you try these values.
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.3.2
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJWjUu/CRDmVDuy+mK58QAArdMQAI+0Er/sdN7TF7knGey2
> 5wJ6Ie81KJlrt/X9fIMpFdwkU2g5ET+sdU9R2hK4XcBpkonfGvwS8Ctha5Aq
> XOJPrN4bMMeDK9Z4angK86ioLJevTH7tzp3FZL0U4Kbt1s9ZpwF6t+wlvkKl
> mt6Tkj4VKr0917TuXqk58AYiZTYcEjGAb0QUe/gC24yFwZYrPO0vUVb4gmTQ
> klNKAdTinGSn4Ynj+lBsEstWGVlTJiL3FA6xRBTz1BSjb4vtb2SoIFwHlAp+
> GO+bKSh19YIasXCZfRqC/J2XcNauOIVfb4l4viV23JN2fYavEnLCnJSglYjF
> Rjxr0wK+6NhRl7naJ1yGNtdMkw+h+nu/xsbYhNqT0EVq1d0nhgzh6ZjAhW1w
> oRiHYA4KNn2uWiUgigpISFi4hJSP4CEPToO8jbhXhARs0H6v33oWrR8RYKxO
> dFz+Lxx969rpDkk+1nRks9hTeIF+oFnW7eezSiR6TILYxvCZQ0ThHXQsL4ph
> bvUr0FQmdV3ukC+Xwa/cePIlVY6JsIQfOlqmrtG7caTZWLvLUDwrwcleb272
> 243GXlbWCxoI7+StJDHPnY2k7NHLvbN2yG3f5PZvZaBgqqyAP8Fnq6CDtTIE
> vZ/p+ZcuRw8lqoDgjjdiFyMmhQnFcCtDo3vtIy/UXDw23AVsI5edUyyP/sHt
> ruPt
> =X7SH
> -----END PGP SIGNATURE-----
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Jan 6, 2016 at 7:13 AM, nick <n...@nine.ch> wrote:
> > Heya,
> > we are using a ceph cluster (6 Nodes with each having 10x4TB HDD + 2x
> SSD (for
> > journal)) in combination with KVM virtualization. All our virtual
> machine hard
> > disks are stored on the ceph cluster. The ceph cluster was updated to the
> > 'infernalis' release recently.
> >
> > We are experiencing problems during cluster maintenance. A normal
> workflow for
> > us looks like this:
> >
> > - set the noout flag for the cluster
> > - stop all OSDs on one node
> > - update the node
> > - reboot the node
> > - start all OSDs
> > - wait for the backfilling to finish
> > - unset the noout flag
> >
> > After we start all OSDs on the node again the cluster backfills and
> tries to
> > get all the OSDs in sync. During the beginning of this process we
> experience
> > 'stalls' in our running virtual machines. On some the load raises to a
> very
> > high value. On others a running webserver responses only with 5xx HTTP
> codes.
> > It takes around 5-6 minutes until all is ok again. After those 5-6
> minutes the
> > cluster is still backfilling, but the virtual machines behave normal
> again.
> >
> > I already set the following parameters in ceph.conf on the nodes to have
> a
> > better rebalance traffic/user traffic ratio:
> >
> > """
> > [osd]
> > osd max backfills = 1
> > osd backfill scan max = 8
> > osd backfill scan min = 4
> > osd recovery max active = 1
> > osd recovery op priority = 1
> > osd op threads = 8
> > """
> >
> > It helped a bit, but we are still experiencing the above written
> problems. It
> > feels like that for a short time some virtual hard disks are locked. Our
> ceph
> > nodes are using bonded 10G network interfaces for the 'OSD network', so
> I do
> > not think that network is a bottleneck.
> >
> > After reading this blog post:
> > http://dachary.org/?p=2182
> > I wonder if there is really a 'read lock' during the object push.
> >
> > Does anyone know more about this or do others have the same problems and
> were
> > able to fix it?
> >
> > Best Regards
> > Nick
> >
> > --
> > Sebastian Nickel
> > Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich
> > Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] KVM problems when rebalance occurs

Reply via email to