2018-02-26 18:02 GMT+01:00 David Turner <drakonst...@gmail.com>: > I'm glad that I was able to help out. I wanted to point out that the > reason those steps worked for you as quickly as they did is likely that you > configured your blocks.db to use the /dev/disk/by-partuuid/{guid} instead > of /dev/sdx#. Had you configured your osds with /dev/sdx#, then you would > have needed to either modify them to point to the partuuid path or changed > them to the new devices name (which is a bad name as it will likely change > on reboot). Changing your path for blocks.db is as simple as `ln -sf > /var/lib/ceph/osd/ceph-#/blocks.db /dev/disk/by-partuuid/{uuid}` and then > restarting the osd to make sure that it can read from the new symlink > location. > > Yes, i (proxmox) used /dev/disk/by-partuuid/{guid} style links.
> I'm curious about your OSDs starting automatically after doing those steps > as well. I would guess you deployed them with ceph-disk instead of > ceph-volume, is that right? ceph-volume no longer uses udev rules and > shouldn't have picked up these changes here. > > Yes, ceph-disk based so udev kicked in on the partprobe. Caspar > On Mon, Feb 26, 2018 at 6:23 AM Caspar Smit <caspars...@supernas.eu> > wrote: > >> 2018-02-24 7:10 GMT+01:00 David Turner <drakonst...@gmail.com>: >> >>> Caspar, it looks like your idea should work. Worst case scenario seems >>> like the osd wouldn't start, you'd put the old SSD back in and go back to >>> the idea to weight them to 0, backfilling, then recreate the osds. >>> Definitely with a try in my opinion, and I'd love to hear your experience >>> after. >>> >>> >> Hi David, >> >> First of all, thank you for ALL your answers on this ML, you're really >> putting a lot of effort into answering many questions asked here and very >> often they contain invaluable information. >> >> >> To follow up on this post i went out and built a very small (proxmox) >> cluster (3 OSD's per host) to test my suggestion of cloning the DB/WAL SDD. >> And it worked! >> Note: this was on Luminous v12.2.2 (all bluestore, ceph-disk based OSD's) >> >> Here's what i did on 1 node: >> >> 1) ceph osd set noout >> 2) systemctl stop osd.0; systemctl stop osd.1; systemctl stop osd.2 >> 3) ddrescue -f -n -vv <old SSD dev> <new SSD dev> /root/clone-db.log >> 4) removed the old SSD physically from the node >> 5) checked with "ceph -s" and already saw HEALTH_OK and all OSD's up/in >> 6) ceph osd unset noout >> >> I assume that once the ddrescue step is finished a 'partprobe' or >> something similar is triggered and udev finds the DB partitions on the new >> SSD and starts the OSD's again (kind of what happens during hotplug) >> So it is probably better to clone the SSD in another (non-ceph) system to >> not trigger any udev events. >> >> I also tested a reboot after this and everything still worked. >> >> >> The old SSD was 120GB and the new is 256GB (cloning took around 4 minutes) >> Delta of data was very low because it was a test cluster. >> >> All in all the OSD's in question were 'down' for only 5 minutes (so i >> stayed within the ceph_osd_down_out interval of the default 10 minutes and >> didn't actually need to set noout :) >> >> Kind regards, >> Caspar >> >> >> >>> Nico, it is not possible to change the WAL or DB size, location, etc >>> after osd creation. If you want to change the configuration of the osd >>> after creation, you have to remove it from the cluster and recreate it. >>> There is no similar functionality to how you could move, recreate, etc >>> filesystem osd journals. I think this might be on the radar as a feature, >>> but I don't know for certain. I definitely consider it to be a regression >>> of bluestore. >>> >>> >>> >>> >>> On Fri, Feb 23, 2018, 9:13 AM Nico Schottelius < >>> nico.schottel...@ungleich.ch> wrote: >>> >>>> >>>> A very interesting question and I would add the follow up question: >>>> >>>> Is there an easy way to add an external DB/WAL devices to an existing >>>> OSD? >>>> >>>> I suspect that it might be something on the lines of: >>>> >>>> - stop osd >>>> - create a link in ...ceph/osd/ceph-XX/block.db to the target device >>>> - (maybe run some kind of osd mkfs ?) >>>> - start osd >>>> >>>> Has anyone done this so far or recommendations on how to do it? >>>> >>>> Which also makes me wonder: what is actually the format of WAL and >>>> BlockDB in bluestore? Is there any documentation available about it? >>>> >>>> Best, >>>> >>>> Nico >>>> >>>> >>>> Caspar Smit <caspars...@supernas.eu> writes: >>>> >>>> > Hi All, >>>> > >>>> > What would be the proper way to preventively replace a DB/WAL SSD >>>> (when it >>>> > is nearing it's DWPD/TBW limit and not failed yet). >>>> > >>>> > It hosts DB partitions for 5 OSD's >>>> > >>>> > Maybe something like: >>>> > >>>> > 1) ceph osd reweight 0 the 5 OSD's >>>> > 2) let backfilling complete >>>> > 3) destroy/remove the 5 OSD's >>>> > 4) replace SSD >>>> > 5) create 5 new OSD's with seperate DB partition on new SSD >>>> > >>>> > When these 5 OSD's are big HDD's (8TB) a LOT of data has to be moved >>>> so i >>>> > thought maybe the following would work: >>>> > >>>> > 1) ceph osd set noout >>>> > 2) stop the 5 OSD's (systemctl stop) >>>> > 3) 'dd' the old SSD to a new SSD of same or bigger size >>>> > 4) remove the old SSD >>>> > 5) start the 5 OSD's (systemctl start) >>>> > 6) let backfilling/recovery complete (only delta data between OSD >>>> stop and >>>> > now) >>>> > 6) ceph osd unset noout >>>> > >>>> > Would this be a viable method to replace a DB SSD? Any udev/serial >>>> nr/uuid >>>> > stuff preventing this to work? >>>> > >>>> > Or is there another 'less hacky' way to replace a DB SSD without >>>> moving too >>>> > much data? >>>> > >>>> > Kind regards, >>>> > Caspar >>>> > _______________________________________________ >>>> > ceph-users mailing list >>>> > ceph-users@lists.ceph.com >>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> >>>> -- >>>> Modern, affordable, Swiss Virtual Machines. Visit >>>> www.datacenterlight.ch >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com