On Mon, Feb 26, 2018 at 3:23 AM Caspar Smit <caspars...@supernas.eu> wrote:
> 2018-02-24 7:10 GMT+01:00 David Turner <drakonst...@gmail.com>: > >> Caspar, it looks like your idea should work. Worst case scenario seems >> like the osd wouldn't start, you'd put the old SSD back in and go back to >> the idea to weight them to 0, backfilling, then recreate the osds. >> Definitely with a try in my opinion, and I'd love to hear your experience >> after. >> >> > Hi David, > > First of all, thank you for ALL your answers on this ML, you're really > putting a lot of effort into answering many questions asked here and very > often they contain invaluable information. > > > To follow up on this post i went out and built a very small (proxmox) > cluster (3 OSD's per host) to test my suggestion of cloning the DB/WAL SDD. > And it worked! > Note: this was on Luminous v12.2.2 (all bluestore, ceph-disk based OSD's) > > Here's what i did on 1 node: > > 1) ceph osd set noout > 2) systemctl stop osd.0; systemctl stop osd.1; systemctl stop osd.2 > 3) ddrescue -f -n -vv <old SSD dev> <new SSD dev> /root/clone-db.log > 4) removed the old SSD physically from the node > 5) checked with "ceph -s" and already saw HEALTH_OK and all OSD's up/in > 6) ceph osd unset noout > > I assume that once the ddrescue step is finished a 'partprobe' or > something similar is triggered and udev finds the DB partitions on the new > SSD and starts the OSD's again (kind of what happens during hotplug) > So it is probably better to clone the SSD in another (non-ceph) system to > not trigger any udev events. > > I also tested a reboot after this and everything still worked. > > > The old SSD was 120GB and the new is 256GB (cloning took around 4 minutes) > Delta of data was very low because it was a test cluster. > > All in all the OSD's in question were 'down' for only 5 minutes (so i > stayed within the ceph_osd_down_out interval of the default 10 minutes and > didn't actually need to set noout :) > I kicked off a brief discussion about this with some of the BlueStore guys and they're aware of the problem with migrating across SSDs, but so far it's just a Trello card: https://trello.com/c/9cxTgG50/324-bluestore-add-remove-resize-wal-db They do confirm you should be okay with dd'ing things across, assuming symlinks get set up correctly as David noted. I've got some other bad news, though: BlueStore has internal metadata about the size of the block device it's using, so if you copy it onto a larger block device, it will not actually make use of the additional space. :( -Greg > > Kind regards, > Caspar > > > >> Nico, it is not possible to change the WAL or DB size, location, etc >> after osd creation. If you want to change the configuration of the osd >> after creation, you have to remove it from the cluster and recreate it. >> There is no similar functionality to how you could move, recreate, etc >> filesystem osd journals. I think this might be on the radar as a feature, >> but I don't know for certain. I definitely consider it to be a regression >> of bluestore. >> >> >> >> >> On Fri, Feb 23, 2018, 9:13 AM Nico Schottelius < >> nico.schottel...@ungleich.ch> wrote: >> >>> >>> A very interesting question and I would add the follow up question: >>> >>> Is there an easy way to add an external DB/WAL devices to an existing >>> OSD? >>> >>> I suspect that it might be something on the lines of: >>> >>> - stop osd >>> - create a link in ...ceph/osd/ceph-XX/block.db to the target device >>> - (maybe run some kind of osd mkfs ?) >>> - start osd >>> >>> Has anyone done this so far or recommendations on how to do it? >>> >>> Which also makes me wonder: what is actually the format of WAL and >>> BlockDB in bluestore? Is there any documentation available about it? >>> >>> Best, >>> >>> Nico >>> >>> >>> Caspar Smit <caspars...@supernas.eu> writes: >>> >>> > Hi All, >>> > >>> > What would be the proper way to preventively replace a DB/WAL SSD >>> (when it >>> > is nearing it's DWPD/TBW limit and not failed yet). >>> > >>> > It hosts DB partitions for 5 OSD's >>> > >>> > Maybe something like: >>> > >>> > 1) ceph osd reweight 0 the 5 OSD's >>> > 2) let backfilling complete >>> > 3) destroy/remove the 5 OSD's >>> > 4) replace SSD >>> > 5) create 5 new OSD's with seperate DB partition on new SSD >>> > >>> > When these 5 OSD's are big HDD's (8TB) a LOT of data has to be moved >>> so i >>> > thought maybe the following would work: >>> > >>> > 1) ceph osd set noout >>> > 2) stop the 5 OSD's (systemctl stop) >>> > 3) 'dd' the old SSD to a new SSD of same or bigger size >>> > 4) remove the old SSD >>> > 5) start the 5 OSD's (systemctl start) >>> > 6) let backfilling/recovery complete (only delta data between OSD stop >>> and >>> > now) >>> > 6) ceph osd unset noout >>> > >>> > Would this be a viable method to replace a DB SSD? Any udev/serial >>> nr/uuid >>> > stuff preventing this to work? >>> > >>> > Or is there another 'less hacky' way to replace a DB SSD without >>> moving too >>> > much data? >>> > >>> > Kind regards, >>> > Caspar >>> > _______________________________________________ >>> > ceph-users mailing list >>> > ceph-users@lists.ceph.com >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> -- >>> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com