Hi, Also make sure that you optimize the debug log config. There's a lot on the ML on how to set them all to low values (0/0).
Not sure how it's in infernalis but it did a lot in previous versions. Regards, Josef On 6 Jan 2016 18:16, "Robert LeBlanc" <rob...@leblancnet.us> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > There has been a lot of "discussion" about osd_backfill_scan[min,max] > lately. My experience with hammer has been opposite that of what > people have said before. Increasing those values for us has reduced > the load of recovery and has prevented a lot of the disruption seen in > our cluster caused by backfilling. It does increase the amount of time > to do the recovery (a new node added to the cluster took about 3-4 > hours before, now takes about 24 hours). > > We are currently using these values and seem to work well for us. > osd_max_backfills = 1 > osd_backfill_scan_min = 16 > osd_recovery_max_active = 1 > osd_backfill_scan_max = 32 > > I would be interested in your results if you try these values. > -----BEGIN PGP SIGNATURE----- > Version: Mailvelope v1.3.2 > Comment: https://www.mailvelope.com > > wsFcBAEBCAAQBQJWjUu/CRDmVDuy+mK58QAArdMQAI+0Er/sdN7TF7knGey2 > 5wJ6Ie81KJlrt/X9fIMpFdwkU2g5ET+sdU9R2hK4XcBpkonfGvwS8Ctha5Aq > XOJPrN4bMMeDK9Z4angK86ioLJevTH7tzp3FZL0U4Kbt1s9ZpwF6t+wlvkKl > mt6Tkj4VKr0917TuXqk58AYiZTYcEjGAb0QUe/gC24yFwZYrPO0vUVb4gmTQ > klNKAdTinGSn4Ynj+lBsEstWGVlTJiL3FA6xRBTz1BSjb4vtb2SoIFwHlAp+ > GO+bKSh19YIasXCZfRqC/J2XcNauOIVfb4l4viV23JN2fYavEnLCnJSglYjF > Rjxr0wK+6NhRl7naJ1yGNtdMkw+h+nu/xsbYhNqT0EVq1d0nhgzh6ZjAhW1w > oRiHYA4KNn2uWiUgigpISFi4hJSP4CEPToO8jbhXhARs0H6v33oWrR8RYKxO > dFz+Lxx969rpDkk+1nRks9hTeIF+oFnW7eezSiR6TILYxvCZQ0ThHXQsL4ph > bvUr0FQmdV3ukC+Xwa/cePIlVY6JsIQfOlqmrtG7caTZWLvLUDwrwcleb272 > 243GXlbWCxoI7+StJDHPnY2k7NHLvbN2yG3f5PZvZaBgqqyAP8Fnq6CDtTIE > vZ/p+ZcuRw8lqoDgjjdiFyMmhQnFcCtDo3vtIy/UXDw23AVsI5edUyyP/sHt > ruPt > =X7SH > -----END PGP SIGNATURE----- > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Wed, Jan 6, 2016 at 7:13 AM, nick <n...@nine.ch> wrote: > > Heya, > > we are using a ceph cluster (6 Nodes with each having 10x4TB HDD + 2x > SSD (for > > journal)) in combination with KVM virtualization. All our virtual > machine hard > > disks are stored on the ceph cluster. The ceph cluster was updated to the > > 'infernalis' release recently. > > > > We are experiencing problems during cluster maintenance. A normal > workflow for > > us looks like this: > > > > - set the noout flag for the cluster > > - stop all OSDs on one node > > - update the node > > - reboot the node > > - start all OSDs > > - wait for the backfilling to finish > > - unset the noout flag > > > > After we start all OSDs on the node again the cluster backfills and > tries to > > get all the OSDs in sync. During the beginning of this process we > experience > > 'stalls' in our running virtual machines. On some the load raises to a > very > > high value. On others a running webserver responses only with 5xx HTTP > codes. > > It takes around 5-6 minutes until all is ok again. After those 5-6 > minutes the > > cluster is still backfilling, but the virtual machines behave normal > again. > > > > I already set the following parameters in ceph.conf on the nodes to have > a > > better rebalance traffic/user traffic ratio: > > > > """ > > [osd] > > osd max backfills = 1 > > osd backfill scan max = 8 > > osd backfill scan min = 4 > > osd recovery max active = 1 > > osd recovery op priority = 1 > > osd op threads = 8 > > """ > > > > It helped a bit, but we are still experiencing the above written > problems. It > > feels like that for a short time some virtual hard disks are locked. Our > ceph > > nodes are using bonded 10G network interfaces for the 'OSD network', so > I do > > not think that network is a bottleneck. > > > > After reading this blog post: > > http://dachary.org/?p=2182 > > I wonder if there is really a 'read lock' during the object push. > > > > Does anyone know more about this or do others have the same problems and > were > > able to fix it? > > > > Best Regards > > Nick > > > > -- > > Sebastian Nickel > > Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich > > Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com