Also i would set 

osd_crush_initial_weight = 0
In ceph.conf an decrease the Crush weight via

Ceph osd Crush reweight osd.36 0.05000

Step by step

Am 25. April 2017 23:19:08 MESZ schrieb Reed Dier <reed.d...@focusvq.com>:
>Others will likely be able to provide some better responses, but I’ll
>take a shot to see if anything makes sense.
>
>With 10.2.6 you should be able to set 'osd scrub during recovery’ to
>false to prevent any new scrubs from occurring during a recovery event.
>Current scrubs will complete, but future scrubs will not being until
>recovery has completed.
>
>Also, adding just one OSD on the new server, assuming all 6 are
>ready(?) will cause a good deal of unnecessary data reshuffling as you
>add more OSD’s.
>And on top of that, assuming the pool’s crush ruleset is ‘chooseleaf
>first 0 type host’ then that should create a bit of an unbalanced
>weighting. Any reason you aren’t bringing in all 6 OSD’s at once?
>You should be able to set noscrub, noscrub-deep, norebalance,
>nobackfill, and norecover flags (also probably want noout to prevent
>rebalance if OSDs flap), wait for scrubs to complete (especially deep),
>add your 6 OSD’s, unset your flags for recovery/rebalance/backfill, and
>it will then move data only once, and hopefully not have the scrub
>load. After recovery, unset the scrub flags, and be back to normal.
>
>Caveat, no VM’s running on my cluster, but those seem like low hanging
>fruit for possible load lightening during a rebalance.
>
>Reed
>
>> On Apr 25, 2017, at 3:47 PM, Ramazan Terzi <ramazante...@gmail.com>
>wrote:
>> 
>> Hello,
>> 
>> I have a Ceph Cluster with specifications below:
>> 3 x Monitor node
>> 6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks
>have SSD journals)
>> Distributed public and private networks. All NICs are 10Gbit/s
>> osd pool default size = 3
>> osd pool default min size = 2
>> 
>> Ceph version is Jewel 10.2.6.
>> 
>> Current health status:
>>    cluster ****************
>>     health HEALTH_OK
>>     monmap e9: 3 mons at
>{ceph-mon01=xxx:6789/0,ceph-mon02=xxx:6789/0,ceph-mon03=xxx:6789/0}
>>            election epoch 84, quorum 0,1,2
>ceph-mon01,ceph-mon02,ceph-mon03
>>     osdmap e1512: 36 osds: 36 up, 36 in
>>            flags sortbitwise,require_jewel_osds
>>      pgmap v7698673: 1408 pgs, 5 pools, 37365 GB data, 9436 kobjects
>>            83871 GB used, 114 TB / 196 TB avail
>>                1408 active+clean
>> 
>> My cluster is active and a lot of virtual machines running on it
>(Linux and Windows VM's, database clusters, web servers etc).
>> 
>> When I want to add a new storage node with 1 disk, I'm getting huge
>problems. With new osd, crushmap updated and Ceph Cluster turns into
>recovery mode. Everything is OK. But after a while, some runnings VM's
>became unmanageable. Servers become unresponsive one by one. Recovery
>process would take an average of 20 hours. For this reason, I removed
>the new osd. Recovery process completed and everythink become normal.
>> 
>> When new osd added, health status:
>>    cluster ****************
>>     health HEALTH_WARN
>>                91 pgs backfill_wait
>>                1 pgs bacfilling
>>                28 pgs degraded
>>                28 pgs recovery_wait
>>                28 phs stuck degraded
>>                recovery 2195/18486602 objects degraded (0.012%)
>>                recovery 1279784/18486602 objects misplaced (6.923%)
>>     monmap e9: 3 mons at
>{ceph-mon01=xxx:6789/0,ceph-mon02=xxx:6789/0,ceph-mon03=xxx:6789/0}
>>            election epoch 84, quorum 0,1,2
>ceph-mon01,ceph-mon02,ceph-mon03
>>     osdmap e1512: 37 osds: 37 up, 37 in
>>            flags sortbitwise,require_jewel_osds
>>      pgmap v7698673: 1408 pgs, 5 pools, 37365 GB data, 9436 kobjects
>>            83871 GB used, 114 TB / 201 TB avail
>>            2195/18486602 objects degraded (0.012%)
>>            1279784/18486602 objects misplaced (6.923%)
>>                1286 active+clean
>>                    91 active+remapped+wait_backfill
>>                   28 active+recovery_wait+degraded
>>                     2 active+clean+scrubbing+deep
>>                     1 active+remapped+backfilling
>> recovery io 430 MB/s, 119 objects/s
>>     client io 36174 B/s rrd, 5567 kB/s wr, 5 op/s rd, 700 op/s wr
>> 
>> Some Ceph config parameters:
>> osd_max_backfills = 1
>> osd_backfill_full_ratio = 0.85
>> osd_recovery_max_active = 3
>> osd_recovery_threads = 1
>> 
>> How I can add new OSD's safely?
>> 
>> Best regards,
>> Ramazan
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>_______________________________________________
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to