Another question - I mentioned here 37% of objects being moved arround - this is MISPLACED object (degraded objects were 0.001%, after I removed 1 OSD from cursh map (out of 44 OSD or so).
Can anybody confirm this is normal behaviour - and are there any workarrounds ? I understand this is because of the object placement algorithm of CEPH, but still 37% of object missplaces just by removing 1 OSD from crush maps out of 44 make me wonder why this large percentage ? Seems not good to me, and I have to remove another 7 OSDs (we are demoting some old hardware nodes). This means I can potentialy go with 7 x the same number of missplaced objects...? Any thoughts ? Thanks On 3 March 2015 at 12:14, Andrija Panic <andrija.pa...@gmail.com> wrote: > Thanks Irek. > > Does this mean, that after peering for each PG, there will be delay of > 10sec, meaning that every once in a while, I will have 10sec od the cluster > NOT being stressed/overloaded, and then the recovery takes place for that > PG, and then another 10sec cluster is fine, and then stressed again ? > > I'm trying to understand process before actually doing stuff (config > reference is there on ceph.com but I don't fully understand the process) > > Thanks, > Andrija > > On 3 March 2015 at 11:32, Irek Fasikhov <malm...@gmail.com> wrote: > >> Hi. >> >> Use value "osd_recovery_delay_start" >> example: >> [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok >> config show | grep osd_recovery_delay_start >> "osd_recovery_delay_start": "10" >> >> 2015-03-03 13:13 GMT+03:00 Andrija Panic <andrija.pa...@gmail.com>: >> >>> HI Guys, >>> >>> I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused >>> over 37% od the data to rebalance - let's say this is fine (this is when I >>> removed it frm Crush Map). >>> >>> I'm wondering - I have previously set some throtling mechanism, but >>> during first 1h of rebalancing, my rate of recovery was going up to 1500 >>> MB/s - and VMs were unusable completely, and then last 4h of the duration >>> of recover this recovery rate went down to, say, 100-200 MB.s and during >>> this VM performance was still pretty impacted, but at least I could work >>> more or a less >>> >>> So my question, is this behaviour expected, is throtling here working as >>> expected, since first 1h was almoust no throtling applied if I check the >>> recovery rate 1500MB/s and the impact on Vms. >>> And last 4h seemed pretty fine (although still lot of impact in general) >>> >>> I changed these throtling on the fly with: >>> >>> ceph tell osd.* injectargs '--osd_recovery_max_active 1' >>> ceph tell osd.* injectargs '--osd_recovery_op_priority 1' >>> ceph tell osd.* injectargs '--osd_max_backfills 1' >>> >>> My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one >>> SSD, 6 journals on another SSD) - I have 3 of these hosts. >>> >>> Any thought are welcome. >>> -- >>> >>> Andrija Panić >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >> >> >> -- >> С уважением, Фасихов Ирек Нургаязович >> Моб.: +79229045757 >> > > > > -- > > Andrija Panić > -- Andrija Panić
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com