Re: [ceph-users] Proper procedure to replace DB/WAL SSD

Caspar Smit Tue, 27 Feb 2018 03:13:22 -0800

2018-02-26 18:02 GMT+01:00 David Turner <drakonst...@gmail.com>:

> I'm glad that I was able to help out.  I wanted to point out that the
> reason those steps worked for you as quickly as they did is likely that you
> configured your blocks.db to use the /dev/disk/by-partuuid/{guid} instead
> of /dev/sdx#.  Had you configured your osds with /dev/sdx#, then you would
> have needed to either modify them to point to the partuuid path or changed
> them to the new devices name (which is a bad name as it will likely change
> on reboot).  Changing your path for blocks.db is as simple as `ln -sf
> /var/lib/ceph/osd/ceph-#/blocks.db /dev/disk/by-partuuid/{uuid}` and then
> restarting the osd to make sure that it can read from the new symlink
> location.
>
>
Yes, i (proxmox) used  /dev/disk/by-partuuid/{guid} style links.



> I'm curious about your OSDs starting automatically after doing those steps
> as well.  I would guess you deployed them with ceph-disk instead of
> ceph-volume, is that right?  ceph-volume no longer uses udev rules and
> shouldn't have picked up these changes here.
>
>
Yes, ceph-disk based so udev kicked in on the partprobe.

Caspar


> On Mon, Feb 26, 2018 at 6:23 AM Caspar Smit <caspars...@supernas.eu>
> wrote:
>
>> 2018-02-24 7:10 GMT+01:00 David Turner <drakonst...@gmail.com>:
>>
>>> Caspar, it looks like your idea should work. Worst case scenario seems
>>> like the osd wouldn't start, you'd put the old SSD back in and go back to
>>> the idea to weight them to 0, backfilling, then recreate the osds.
>>> Definitely with a try in my opinion, and I'd love to hear your experience
>>> after.
>>>
>>>
>> Hi David,
>>
>> First of all, thank you for ALL your answers on this ML, you're really
>> putting a lot of effort into answering many questions asked here and very
>> often they contain invaluable information.
>>
>>
>> To follow up on this post i went out and built a very small (proxmox)
>> cluster (3 OSD's per host) to test my suggestion of cloning the DB/WAL SDD.
>> And it worked!
>> Note: this was on Luminous v12.2.2 (all bluestore, ceph-disk based OSD's)
>>
>> Here's what i did on 1 node:
>>
>> 1) ceph osd set noout
>> 2) systemctl stop osd.0; systemctl stop osd.1; systemctl stop osd.2
>> 3) ddrescue -f -n -vv <old SSD dev> <new SSD dev> /root/clone-db.log
>> 4) removed the old SSD physically from the node
>> 5) checked with "ceph -s" and already saw HEALTH_OK and all OSD's up/in
>> 6) ceph osd unset noout
>>
>> I assume that once the ddrescue step is finished a 'partprobe' or
>> something similar is triggered and udev finds the DB partitions on the new
>> SSD and starts the OSD's again (kind of what happens during hotplug)
>> So it is probably better to clone the SSD in another (non-ceph) system to
>> not trigger any udev events.
>>
>> I also tested a reboot after this and everything still worked.
>>
>>
>> The old SSD was 120GB and the new is 256GB (cloning took around 4 minutes)
>> Delta of data was very low because it was a test cluster.
>>
>> All in all the OSD's in question were 'down' for only 5 minutes (so i
>> stayed within the ceph_osd_down_out interval of the default 10 minutes and
>> didn't actually need to set noout :)
>>
>> Kind regards,
>> Caspar
>>
>>
>>
>>> Nico, it is not possible to change the WAL or DB size, location, etc
>>> after osd creation. If you want to change the configuration of the osd
>>> after creation, you have to remove it from the cluster and recreate it.
>>> There is no similar functionality to how you could move, recreate, etc
>>> filesystem osd journals. I think this might be on the radar as a feature,
>>> but I don't know for certain. I definitely consider it to be a regression
>>> of bluestore.
>>>
>>>
>>>
>>>
>>> On Fri, Feb 23, 2018, 9:13 AM Nico Schottelius <
>>> nico.schottel...@ungleich.ch> wrote:
>>>
>>>>
>>>> A very interesting question and I would add the follow up question:
>>>>
>>>> Is there an easy way to add an external DB/WAL devices to an existing
>>>> OSD?
>>>>
>>>> I suspect that it might be something on the lines of:
>>>>
>>>> - stop osd
>>>> - create a link in ...ceph/osd/ceph-XX/block.db to the target device
>>>> - (maybe run some kind of osd mkfs ?)
>>>> - start osd
>>>>
>>>> Has anyone done this so far or recommendations on how to do it?
>>>>
>>>> Which also makes me wonder: what is actually the format of WAL and
>>>> BlockDB in bluestore? Is there any documentation available about it?
>>>>
>>>> Best,
>>>>
>>>> Nico
>>>>
>>>>
>>>> Caspar Smit <caspars...@supernas.eu> writes:
>>>>
>>>> > Hi All,
>>>> >
>>>> > What would be the proper way to preventively replace a DB/WAL SSD
>>>> (when it
>>>> > is nearing it's DWPD/TBW limit and not failed yet).
>>>> >
>>>> > It hosts DB partitions for 5 OSD's
>>>> >
>>>> > Maybe something like:
>>>> >
>>>> > 1) ceph osd reweight 0 the 5 OSD's
>>>> > 2) let backfilling complete
>>>> > 3) destroy/remove the 5 OSD's
>>>> > 4) replace SSD
>>>> > 5) create 5 new OSD's with seperate DB partition on new SSD
>>>> >
>>>> > When these 5 OSD's are big HDD's (8TB) a LOT of data has to be moved
>>>> so i
>>>> > thought maybe the following would work:
>>>> >
>>>> > 1) ceph osd set noout
>>>> > 2) stop the 5 OSD's (systemctl stop)
>>>> > 3) 'dd' the old SSD to a new SSD of same or bigger size
>>>> > 4) remove the old SSD
>>>> > 5) start the 5 OSD's (systemctl start)
>>>> > 6) let backfilling/recovery complete (only delta data between OSD
>>>> stop and
>>>> > now)
>>>> > 6) ceph osd unset noout
>>>> >
>>>> > Would this be a viable method to replace a DB SSD? Any udev/serial
>>>> nr/uuid
>>>> > stuff preventing this to work?
>>>> >
>>>> > Or is there another 'less hacky' way to replace a DB SSD without
>>>> moving too
>>>> > much data?
>>>> >
>>>> > Kind regards,
>>>> > Caspar
>>>> > _______________________________________________
>>>> > ceph-users mailing list
>>>> > ceph-users@lists.ceph.com
>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>> --
>>>> Modern, affordable, Swiss Virtual Machines. Visit
>>>> www.datacenterlight.ch
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

Reply via email to