Thank you for documenting your progress and peril on the ML. Luckily I only have 24x 8TB HDD and 50x 1.92TB SSDs to migrate over to bluestore.
8 nodes, 4 chassis (failure domain), 3 drives per node for the HDDs, so I’m able to do about 3 at a time (1 node) for rip/replace. Definitely taking it slow and steady, and the SSDs will move quickly for backfills as well. Seeing about 1TB/6hr on backfills, without much performance hit on rest of everything, about 5TB average util on each 8TB disk, so just about 30 hours-ish per host *8 hosts will be about 10 days, so a couple weeks is a safe amount of headway. This write performance certainly seems better on bluestore than filestore, so that likely helps as well. Expect I can probably refill an SSD osd in about an hour or two, and will likely stagger those out. But with such a small number of osd’s currently, I’m taking the by-hand approach rather than scripting it so as to avoid similar pitfalls. Reed > On Jan 11, 2018, at 12:38 PM, Brady Deetz <bde...@gmail.com> wrote: > > I hear you on time. I have 350 x 6TB drives to convert. I recently posted > about a disaster I created automating my migration. Good luck > > On Jan 11, 2018 12:22 PM, "Reed Dier" <reed.d...@focusvq.com > <mailto:reed.d...@focusvq.com>> wrote: > I am in the process of migrating my OSDs to bluestore finally and thought I > would give you some input on how I am approaching it. > Some of saga you can find in another ML thread here: > https://www.spinics.net/lists/ceph-users/msg41802.html > <https://www.spinics.net/lists/ceph-users/msg41802.html> > > My first OSD I was cautious, and I outed the OSD without downing it, allowing > it to move data off. > Some background on my cluster, for this OSD, it is an 8TB spinner, with an > NVMe partition previously used for journaling in filestore, intending to be > used for block.db in bluestore. > > Then I downed it, flushed the journal, destroyed it, zapped with ceph-volume, > set norecover and norebalance flags, did ceph osd crush remove osd.$ID, ceph > auth del osd.$ID, and ceph osd rm osd.$ID and used ceph-volume locally to > create the new LVM target. Then unset the norecover and norebalance flags and > it backfilled like normal. > > I initially ran into issues with specifying --osd.id <http://osd.id/> causing > my osd’s to fail to start, but removing that I was able to get it to fill in > the gap of the OSD I just removed. > > I’m now doing quicker, more destructive migrations in an attempt to reduce > data movement. > This way I don’t read from OSD I’m replacing, write to other OSD temporarily, > read back from temp OSD, write back to ‘new’ OSD. > I’m just reading from replica and writing to ‘new’ OSD. > > So I’m setting the norecover and norebalance flags, down the OSD (but not > out, it stays in, also have the noout flag set), destroy/zap, recreate using > ceph-volume, unset the flags, and it starts backfilling. > For 8TB disks, and with 23 other 8TB disks in the pool, it takes a long time > to offload it and then backfill back from them. I trust my disks enough to > backfill from the other disks, and its going well. Also seeing very good > write performance backfilling compared to previous drive replacements in > filestore, so thats very promising. > > Reed > >> On Jan 10, 2018, at 8:29 AM, Jens-U. Mozdzen <jmozd...@nde.ag >> <mailto:jmozd...@nde.ag>> wrote: >> >> Hi Alfredo, >> >> thank you for your comments: >> >> Zitat von Alfredo Deza <ad...@redhat.com <mailto:ad...@redhat.com>>: >>> On Wed, Jan 10, 2018 at 8:57 AM, Jens-U. Mozdzen <jmozd...@nde.ag >>> <mailto:jmozd...@nde.ag>> wrote: >>>> Dear *, >>>> >>>> has anybody been successful migrating Filestore OSDs to Bluestore OSDs, >>>> keeping the OSD number? There have been a number of messages on the list, >>>> reporting problems, and my experience is the same. (Removing the existing >>>> OSD and creating a new one does work for me.) >>>> >>>> I'm working on an Ceph 12.2.2 cluster and tried following >>>> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd >>>> >>>> <http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd> >>>> - this basically says >>>> >>>> 1. destroy old OSD >>>> 2. zap the disk >>>> 3. prepare the new OSD >>>> 4. activate the new OSD >>>> >>>> I never got step 4 to complete. The closest I got was by doing the >>>> following >>>> steps (assuming OSD ID "999" on /dev/sdzz): >>>> >>>> 1. Stop the old OSD via systemd (osd-node # systemctl stop >>>> ceph-osd@999.service) >>>> >>>> 2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999) >>>> >>>> 3a. if the old OSD was Bluestore with LVM, manually clean up the old OSD's >>>> volume group >>>> >>>> 3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz) >>>> >>>> 4. destroy the old OSD (osd-node # ceph osd destroy 999 >>>> --yes-i-really-mean-it) >>>> >>>> 5. create a new OSD entry (osd-node # ceph osd new $(cat >>>> /var/lib/ceph/osd/ceph-999/fsid) 999) >>> >>> Step 5 and 6 are problematic if you are going to be trying ceph-volume >>> later on, which takes care of doing this for you. >>> >>>> >>>> 6. add the OSD secret to Ceph authentication (osd-node # ceph auth add >>>> osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd' -i >>>> /var/lib/ceph/osd/ceph-999/keyring) >> >> I at first tried to follow the documented steps (without my steps 5 and 6), >> which did not work for me. The documented approach failed with "init >> authentication >> failed: (1) Operation not permitted", because actually >> ceph-volume did not add the auth entry for me. >> >> But even after manually adding the authentication, the "ceph-volume" >> approach failed, as the OSD was still marked "destroyed" in the osdmap epoch >> as used by ceph-osd (see the commented messages from ceph-osd.999.log below). >> >>>> >>>> 7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore >>>> --osd-id 999 --data /dev/sdzz) >>> >>> You are going to hit a bug in ceph-volume that is preventing you from >>> specifying the osd id directly if the ID has been destroyed. >>> >>> See http://tracker.ceph.com/issues/22642 >>> <http://tracker.ceph.com/issues/22642> >> >> If I read that bug description correctly, you're confirming why I needed >> step #6 above (manually adding the OSD auth entry. But even if ceph-volume >> had added it, the ceph-osd.log entries suggest that starting the OSD would >> still have failed, because of accessing the wrong osdmap epoch. >> >> To me it seems like I'm hitting a bug outside of ceph-volume - unless it's >> ceph-volume that somehow determines which osdmap epoch is used by ceph-osd. >> >>> In order for this to work, you would need to make sure that the ID has >>> really been destroyed and avoid passing --osd-id in ceph-volume. The >>> caveat >>> being that you will get whatever ID is available next in the cluster. >> >> Yes, that's the work-around I then used - purge the old OSD and create a new >> one. >> >> Thanks & regards, >> Jens >> >>>> [...] >>>> --- cut here --- >>>> # first of multiple attempts, before "ceph auth add ..." >>>> # no actual epoch referenced, as login failed due to missing auth >>>> 2018-01-10 00:00:02.173983 7f5cf1c89d00 0 osd.999 0 crush map has features >>>> 288232575208783872, adjusting msgr requires for clients >>>> 2018-01-10 00:00:02.173990 7f5cf1c89d00 0 osd.999 0 crush map has features >>>> 288232575208783872 was 8705, adjusting msgr requires for mons >>>> 2018-01-10 00:00:02.173994 7f5cf1c89d00 0 osd.999 0 crush map has features >>>> 288232575208783872, adjusting msgr requires for osds >>>> 2018-01-10 00:00:02.174046 7f5cf1c89d00 0 osd.999 0 load_pgs >>>> 2018-01-10 00:00:02.174051 7f5cf1c89d00 0 osd.999 0 load_pgs opened 0 pgs >>>> 2018-01-10 00:00:02.174055 7f5cf1c89d00 0 osd.999 0 using weightedpriority >>>> op queue with priority op cut off at 64. >>>> 2018-01-10 00:00:02.174891 7f5cf1c89d00 -1 osd.999 0 log_to_monitors >>>> {default=true} >>>> 2018-01-10 00:00:02.177479 7f5cf1c89d00 -1 osd.999 0 init authentication >>>> failed: (1) Operation not permitted >>>> >>>> # after "ceph auth ..." >>>> # note the different epochs below? BTW, 110587 is the current epoch at that >>>> time and osd.999 is marked destroyed there >>>> # 109892: much too old to offer any details >>>> # 110587: modified 2018-01-09 23:43:13.202381 >>>> >>>> 2018-01-10 00:08:00.945507 7fc55905bd00 0 osd.999 0 crush map has features >>>> 288232575208783872, adjusting msgr requires for clients >>>> 2018-01-10 00:08:00.945514 7fc55905bd00 0 osd.999 0 crush map has features >>>> 288232575208783872 was 8705, adjusting msgr requires for mons >>>> 2018-01-10 00:08:00.945521 7fc55905bd00 0 osd.999 0 crush map has features >>>> 288232575208783872, adjusting msgr requires for osds >>>> 2018-01-10 00:08:00.945588 7fc55905bd00 0 osd.999 0 load_pgs >>>> 2018-01-10 00:08:00.945594 7fc55905bd00 0 osd.999 0 load_pgs opened 0 pgs >>>> 2018-01-10 00:08:00.945599 7fc55905bd00 0 osd.999 0 using weightedpriority >>>> op queue with priority op cut off at 64. >>>> 2018-01-10 00:08:00.946544 7fc55905bd00 -1 osd.999 0 log_to_monitors >>>> {default=true} >>>> 2018-01-10 00:08:00.951720 7fc55905bd00 0 osd.999 0 done with init, >>>> starting boot process >>>> 2018-01-10 00:08:00.952225 7fc54160a700 -1 osd.999 0 waiting for initial >>>> osdmap >>>> 2018-01-10 00:08:00.970644 7fc546614700 0 osd.999 109892 crush map has >>>> features 288232610642264064, adjusting msgr requires for clients >>>> 2018-01-10 00:08:00.970653 7fc546614700 0 osd.999 109892 crush map has >>>> features 288232610642264064 was 288232575208792577, adjusting msgr requires >>>> for mons >>>> 2018-01-10 00:08:00.970660 7fc546614700 0 osd.999 109892 crush map has >>>> features 1008808551021559808, adjusting msgr requires for osds >>>> 2018-01-10 00:08:01.349602 7fc546614700 -1 osd.999 110587 osdmap says I am >>>> destroyed, exiting >>>> >>>> # another try >>>> # it is now using epoch 110587 for everything. But that one is off by one >>>> at >>>> that time already: >>>> # 110587: modified 2018-01-09 23:43:13.202381 >>>> # 110588: modified 2018-01-10 00:12:55.271913 >>>> >>>> # but both 110587 and 110588 have osd.999 as "destroyed", so never mind. >>>> 2018-01-10 00:13:04.332026 7f408d5a4d00 0 osd.999 110587 crush map has >>>> features 288232610642264064, adjusting msgr requires for clients >>>> 2018-01-10 00:13:04.332037 7f408d5a4d00 0 osd.999 110587 crush map has >>>> features 288232610642264064 was 8705, adjusting msgr requires for mons >>>> 2018-01-10 00:13:04.332043 7f408d5a4d00 0 osd.999 110587 crush map has >>>> features 1008808551021559808, adjusting msgr requires for osds >>>> 2018-01-10 00:13:04.332092 7f408d5a4d00 0 osd.999 110587 load_pgs >>>> 2018-01-10 00:13:04.332096 7f408d5a4d00 0 osd.999 110587 load_pgs opened 0 >>>> pgs >>>> 2018-01-10 00:13:04.332100 7f408d5a4d00 0 osd.999 110587 using >>>> weightedpriority op queue with priority op cut off at 64. >>>> 2018-01-10 00:13:04.332990 7f408d5a4d00 -1 osd.999 110587 log_to_monitors >>>> {default=true} >>>> 2018-01-10 00:13:06.026628 7f408d5a4d00 0 osd.999 110587 done with init, >>>> starting boot process >>>> 2018-01-10 00:13:06.027627 7f4075352700 -1 osd.999 110587 osdmap says I am >>>> destroyed, exiting >>>> >>>> # the attempt after using "ceph osd new", which created epoch 110591 as the >>>> first with osd.999 as autoout,exists,new >>>> # But ceph-osd still uses 110587. >>>> # 110587: modified 2018-01-09 23:43:13.202381 >>>> # 110591: modified 2018-01-10 00:30:44.850078 >>>> >>>> 2018-01-10 00:31:15.453871 7f1c57c58d00 0 osd.999 110587 crush map has >>>> features 288232610642264064, adjusting msgr requires for clients >>>> 2018-01-10 00:31:15.453882 7f1c57c58d00 0 osd.999 110587 crush map has >>>> features 288232610642264064 was 8705, adjusting msgr requires for mons >>>> 2018-01-10 00:31:15.453887 7f1c57c58d00 0 osd.999 110587 crush map has >>>> features 1008808551021559808, adjusting msgr requires for osds >>>> 2018-01-10 00:31:15.453940 7f1c57c58d00 0 osd.999 110587 load_pgs >>>> 2018-01-10 00:31:15.453945 7f1c57c58d00 0 osd.999 110587 load_pgs opened 0 >>>> pgs >>>> 2018-01-10 00:31:15.453952 7f1c57c58d00 0 osd.999 110587 using >>>> weightedpriority op queue with priority op cut off at 64. >>>> 2018-01-10 00:31:15.454862 7f1c57c58d00 -1 osd.999 110587 log_to_monitors >>>> {default=true} >>>> 2018-01-10 00:31:15.520533 7f1c57c58d00 0 osd.999 110587 done with init, >>>> starting boot process >>>> 2018-01-10 00:31:15.521278 7f1c40207700 -1 osd.999 110587 osdmap says I am >>>> destroyed, exiting >>>> --- cut here --- >>>> [...] >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com