Hi, How did you benchmark?
I would recommend to have a lot of mysql with a lot of innodb tables that are utilised heavily. During a recover you should see the latency rise at least. Maybe using one of the tools here https://dev.mysql.com/downloads/benchmarks.html Regards, Josef On 7 Jan 2016 16:36, "Robert LeBlanc" <rob...@leblancnet.us> wrote: > With these min,max settings, we didn't have any problem going to more > backfills. > > Robert LeBlanc > > Sent from a mobile device please excuse any typos. > On Jan 7, 2016 8:30 AM, "nick" <n...@nine.ch> wrote: > >> Heya, >> thank you for your answers. We will try to set 16/32 as values for >> osd_backfill_scan_[min|max]. I also set the debug logging config. Here is >> an >> excerpt of our new ceph.conf: >> >> """ >> [osd] >> osd max backfills = 1 >> osd backfill scan max = 32 >> osd backfill scan min = 16 >> osd recovery max active = 1 >> osd recovery op priority = 1 >> osd op threads = 8 >> >> [global] >> debug optracker = 0/0 >> debug asok = 0/0 >> debug hadoop = 0/0 >> debug mds migrator = 0/0 >> debug objclass = 0/0 >> debug paxos = 0/0 >> debug context = 0/0 >> debug objecter = 0/0 >> debug mds balancer = 0/0 >> debug finisher = 0/0 >> debug auth = 0/0 >> debug buffer = 0/0 >> debug lockdep = 0/0 >> debug mds log = 0/0 >> debug heartbeatmap = 0/0 >> debug journaler = 0/0 >> debug mon = 0/0 >> debug client = 0/0 >> debug mds = 0/0 >> debug throttle = 0/0 >> debug journal = 0/0 >> debug crush = 0/0 >> debug objectcacher = 0/0 >> debug filer = 0/0 >> debug perfcounter = 0/0 >> debug filestore = 0/0 >> debug rgw = 0/0 >> debug monc = 0/0 >> debug rbd = 0/0 >> debug tp = 0/0 >> debug osd = 0/0 >> debug ms = 0/0 >> debug mds locker = 0/0 >> debug timer = 0/0 >> debug mds log expire = 0/0 >> debug rados = 0/0 >> debug striper = 0/0 >> debug rbd replay = 0/0 >> debug none = 0/0 >> debug keyvaluestore = 0/0 >> debug compressor = 0/0 >> debug crypto = 0/0 >> debug xio = 0/0 >> debug civetweb = 0/0 >> debug newstore = 0/0 >> """ >> >> I already made a benchmark on our staging setup with the new config and >> fio, but >> did not really get different results than before. >> >> For us it is hardly possible to reproduce the 'stalling' problems on the >> staging cluster so I will have to wait and test this in production. >> >> Does anyone know if 'osd max backfills' > 1 could have an impact as well? >> The >> default seems to be 10... >> >> Cheers >> Nick >> >> >> >> On Wednesday, January 06, 2016 09:17:43 PM Josef Johansson wrote: >> > Hi, >> > >> > Also make sure that you optimize the debug log config. There's a lot on >> the >> > ML on how to set them all to low values (0/0). >> > >> > Not sure how it's in infernalis but it did a lot in previous versions. >> > >> > Regards, >> > Josef >> > >> > On 6 Jan 2016 18:16, "Robert LeBlanc" <rob...@leblancnet.us> wrote: >> > > -----BEGIN PGP SIGNED MESSAGE----- >> > > Hash: SHA256 >> > > >> > > There has been a lot of "discussion" about osd_backfill_scan[min,max] >> > > lately. My experience with hammer has been opposite that of what >> > > people have said before. Increasing those values for us has reduced >> > > the load of recovery and has prevented a lot of the disruption seen in >> > > our cluster caused by backfilling. It does increase the amount of time >> > > to do the recovery (a new node added to the cluster took about 3-4 >> > > hours before, now takes about 24 hours). >> > > >> > > We are currently using these values and seem to work well for us. >> > > osd_max_backfills = 1 >> > > osd_backfill_scan_min = 16 >> > > osd_recovery_max_active = 1 >> > > osd_backfill_scan_max = 32 >> > > >> > > I would be interested in your results if you try these values. >> > > -----BEGIN PGP SIGNATURE----- >> > > Version: Mailvelope v1.3.2 >> > > Comment: https://www.mailvelope.com >> > > >> > > wsFcBAEBCAAQBQJWjUu/CRDmVDuy+mK58QAArdMQAI+0Er/sdN7TF7knGey2 >> > > 5wJ6Ie81KJlrt/X9fIMpFdwkU2g5ET+sdU9R2hK4XcBpkonfGvwS8Ctha5Aq >> > > XOJPrN4bMMeDK9Z4angK86ioLJevTH7tzp3FZL0U4Kbt1s9ZpwF6t+wlvkKl >> > > mt6Tkj4VKr0917TuXqk58AYiZTYcEjGAb0QUe/gC24yFwZYrPO0vUVb4gmTQ >> > > klNKAdTinGSn4Ynj+lBsEstWGVlTJiL3FA6xRBTz1BSjb4vtb2SoIFwHlAp+ >> > > GO+bKSh19YIasXCZfRqC/J2XcNauOIVfb4l4viV23JN2fYavEnLCnJSglYjF >> > > Rjxr0wK+6NhRl7naJ1yGNtdMkw+h+nu/xsbYhNqT0EVq1d0nhgzh6ZjAhW1w >> > > oRiHYA4KNn2uWiUgigpISFi4hJSP4CEPToO8jbhXhARs0H6v33oWrR8RYKxO >> > > dFz+Lxx969rpDkk+1nRks9hTeIF+oFnW7eezSiR6TILYxvCZQ0ThHXQsL4ph >> > > bvUr0FQmdV3ukC+Xwa/cePIlVY6JsIQfOlqmrtG7caTZWLvLUDwrwcleb272 >> > > 243GXlbWCxoI7+StJDHPnY2k7NHLvbN2yG3f5PZvZaBgqqyAP8Fnq6CDtTIE >> > > vZ/p+ZcuRw8lqoDgjjdiFyMmhQnFcCtDo3vtIy/UXDw23AVsI5edUyyP/sHt >> > > ruPt >> > > =X7SH >> > > -----END PGP SIGNATURE----- >> > > ---------------- >> > > Robert LeBlanc >> > > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >> > > >> > > On Wed, Jan 6, 2016 at 7:13 AM, nick <n...@nine.ch> wrote: >> > > > Heya, >> > > > we are using a ceph cluster (6 Nodes with each having 10x4TB HDD + >> 2x >> > > >> > > SSD (for >> > > >> > > > journal)) in combination with KVM virtualization. All our virtual >> > > >> > > machine hard >> > > >> > > > disks are stored on the ceph cluster. The ceph cluster was updated >> to >> > > > the >> > > > 'infernalis' release recently. >> > > > >> > > > We are experiencing problems during cluster maintenance. A normal >> > > >> > > workflow for >> > > >> > > > us looks like this: >> > > > >> > > > - set the noout flag for the cluster >> > > > - stop all OSDs on one node >> > > > - update the node >> > > > - reboot the node >> > > > - start all OSDs >> > > > - wait for the backfilling to finish >> > > > - unset the noout flag >> > > > >> > > > After we start all OSDs on the node again the cluster backfills and >> > > >> > > tries to >> > > >> > > > get all the OSDs in sync. During the beginning of this process we >> > > >> > > experience >> > > >> > > > 'stalls' in our running virtual machines. On some the load raises >> to a >> > > >> > > very >> > > >> > > > high value. On others a running webserver responses only with 5xx >> HTTP >> > > >> > > codes. >> > > >> > > > It takes around 5-6 minutes until all is ok again. After those 5-6 >> > > >> > > minutes the >> > > >> > > > cluster is still backfilling, but the virtual machines behave normal >> > > >> > > again. >> > > >> > > > I already set the following parameters in ceph.conf on the nodes to >> have >> > > >> > > a >> > > >> > > > better rebalance traffic/user traffic ratio: >> > > > >> > > > """ >> > > > [osd] >> > > > osd max backfills = 1 >> > > > osd backfill scan max = 8 >> > > > osd backfill scan min = 4 >> > > > osd recovery max active = 1 >> > > > osd recovery op priority = 1 >> > > > osd op threads = 8 >> > > > """ >> > > > >> > > > It helped a bit, but we are still experiencing the above written >> > > >> > > problems. It >> > > >> > > > feels like that for a short time some virtual hard disks are >> locked. Our >> > > >> > > ceph >> > > >> > > > nodes are using bonded 10G network interfaces for the 'OSD >> network', so >> > > >> > > I do >> > > >> > > > not think that network is a bottleneck. >> > > > >> > > > After reading this blog post: >> > > > http://dachary.org/?p=2182 >> > > > I wonder if there is really a 'read lock' during the object push. >> > > > >> > > > Does anyone know more about this or do others have the same >> problems and >> > > >> > > were >> > > >> > > > able to fix it? >> > > > >> > > > Best Regards >> > > > Nick >> > > > >> > > > -- >> > > > Sebastian Nickel >> > > > Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich >> > > > Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch >> > > > _______________________________________________ >> > > > ceph-users mailing list >> > > > ceph-users@lists.ceph.com >> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > >> > > _______________________________________________ >> > > ceph-users mailing list >> > > ceph-users@lists.ceph.com >> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> -- >> Sebastian Nickel >> Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich >> Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com