These SSD’s are definitely up to the task, 3-5 DWPD over 5 years, however I mostly use an abundance of caution and try to minimize unnecessary data movement so as not to exacerbate things.
I definitely could, I just er on the side of conservative wear. Reed > On Aug 6, 2018, at 11:19 AM, Richard Hesketh <richard.hesk...@rd.bbc.co.uk> > wrote: > > I would have thought that with the write endurance on modern SSDs, > additional write wear from the occasional rebalance would honestly be > negligible? If you're hitting them hard enough that you're actually > worried about your write endurance, a rebalance or two is peanuts > compared to your normal I/O. If you're not, then there's more than > enough write endurance in an SSD to handle daily rebalances for years. > > On 06/08/18 17:05, Reed Dier wrote: >> This has been my modus operandi when replacing drives. >> >> Only having ~50 OSD’s for each drive type/pool, rebalancing can be a lengthy >> process, and in the case of SSD’s, shuffling data adds unnecessary write >> wear to the disks. >> >> When migrating from filestore to bluestore, I would actually forklift an >> entire failure domain using the below script, and the noout, norebalance, >> norecover flags. >> >> This would keep crush from pushing data around until I had all of the drives >> replaced, and would then keep crush from trying to recover until I was ready. >> >>> # $1 use $ID or osd.id >>> # $2 use $DATA or /dev/sdx >>> # $3 use $NVME or /dev/nvmeXnXpX >>> >>> sudo systemctl stop ceph-osd@$1.service >>> sudo ceph-osd -i $1 --flush-journal >>> sudo umount /var/lib/ceph/osd/ceph-$1 >>> sudo ceph-volume lvm zap /dev/$2 >>> ceph osd crush remove osd.$1 >>> ceph auth del osd.$1 >>> ceph osd rm osd.$1 >>> sudo ceph-volume lvm create --bluestore --data /dev/$2 --block.db /dev/$3 >> >> For a single drive, this would stop it, remove it from crush, make a new one >> (and let it retake the old/existing osd.id), and then after I unset the >> norebalance/norecover flags, then it backfills from the other copies to the >> replaced drive, and doesn’t move data around. >> That script is specific for filestore to bluestore somewhat, as the >> flush-journal command is no longer used in bluestore. >> >> Hope thats helpful. >> >> Reed >> >>> On Aug 6, 2018, at 9:30 AM, Richard Hesketh <richard.hesk...@rd.bbc.co.uk> >>> wrote: >>> >>> Waiting for rebalancing is considered the safest way, since it ensures >>> you retain your normal full number of replicas at all times. If you take >>> the disk out before rebalancing is complete, you will be causing some >>> PGs to lose a replica. That is a risk to your data redundancy, but it >>> might be an acceptable one if you prefer to just get the disk replaced >>> quickly. >>> >>> Personally, if running at 3+ replicas, briefly losing one isn't the end >>> of the world; you'd still need two more simultaneous disk failures to >>> actually lose data, though one failure would cause inactive PGs (because >>> you are running with min_size >= 2, right?). If running pools with only >>> two replicas at size = 2 I absolutely would not remove a disk without >>> waiting for rebalancing unless that disk was actively failing so badly >>> that it was making rebalancing impossible. >>> >>> Rich >>> >>> On 06/08/18 15:20, Josef Zelenka wrote: >>>> Hi, our procedure is usually(assured that the cluster was ok the >>>> failure, with 2 replicas as crush rule) >>>> >>>> 1.Stop the OSD process(to keep it from coming up and down and putting >>>> load on the cluster) >>>> >>>> 2. Wait for the "Reweight" to come to 0(happens after 5 min i think - >>>> can be set manually but i let it happen by itself) >>>> >>>> 3. remove the osd from cluster(ceph auth del, ceph osd crush remove, >>>> ceph osd rm) >>>> >>>> 4. note down the journal partitions if needed >>>> >>>> 5. umount drive, replace the disk with new one >>>> >>>> 6. ensure permissions are set to ceph:ceph in /dev >>>> >>>> 7. mklabel gpt on the new drive >>>> >>>> 8. create the new osd with ceph-disk prepare(automatically adds it to >>>> the crushmap) >>>> >>>> >>>> your procedure sounds reasonable to me, as far as i'm concerned you >>>> shouldn't have to wait for rebalancing after you remove the osd. all >>>> this might not be 100% per ceph books but it works for us :) >>>> >>>> Josef >>>> >>>> >>>> On 06/08/18 16:15, Iztok Gregori wrote: >>>>> Hi Everyone, >>>>> >>>>> Which is the best way to replace a failing (SMART Health Status: >>>>> HARDWARE IMPENDING FAILURE) OSD hard disk? >>>>> >>>>> Normally I will: >>>>> >>>>> 1. set the OSD as out >>>>> 2. wait for rebalancing >>>>> 3. stop the OSD on the osd-server (unmount if needed) >>>>> 4. purge the OSD from CEPH >>>>> 5. physically replace the disk with the new one >>>>> 6. with ceph-deploy: >>>>> 6a zap the new disk (just in case) >>>>> 6b create the new OSD >>>>> 7. add the new osd to the crush map. >>>>> 8. wait for rebalancing. >>>>> >>>>> My questions are: >>>>> >>>>> - Is my procedure reasonable? >>>>> - What if I skip the #2 and instead to wait for rebalancing I directly >>>>> purge the OSD? >>>>> - Is better to reweight the OSD before take it out? >>>>> >>>>> I'm running a Luminous (12.2.2) cluster with 332 OSDs, failure domain >>>>> is host. >>>>> >>>>> Thanks, >>>>> Iztok >>>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com