Re: [ceph-users] Filestore to Bluestore migration question

Amit Ghadge Mon, 05 Nov 2018 08:08:21 -0800

On Mon, 5 Nov 2018, 21:13 Hayashida, Mami, <mami.hayash...@uky.edu> wrote:


> Additional info -- I know that /var/lib/ceph/osd/ceph-{60..69} are not
> mounted at this point (i.e.  mount | grep ceph-60, and 61-69, returns
> nothing.).  They don't show up when I run "df", either.
>
ceph-volume command automatically mount ceph-idx directory to tmpfs

>
> On Mon, Nov 5, 2018 at 10:15 AM, Hayashida, Mami <mami.hayash...@uky.edu>
> wrote:
>
>> Well, over the weekend the whole server went down and is now in the
>> emergency mode. (I am running Ubuntu 16.04).  When I run "journalctl  -p
>> err -xb"   I see that
>>
>> systemd[1]: Timed out waiting for device dev-sdh1.device.
>> -- Subject: Unit dev-sdh1.device has failed
>> -- Defined-By: systemd
>> -- Support: http://lists.freeddesktop.org/....
>> --
>> -- Unit dev-sdh1.device has failed.
>>
>>
>> I see this for every single one of the newly-converted Bluestore OSD
>> disks (/dev/sd{h..q}1).
>>
>>
>> --
>>
>> On Mon, Nov 5, 2018 at 9:57 AM, Alfredo Deza <ad...@redhat.com> wrote:
>>
>>> On Fri, Nov 2, 2018 at 5:04 PM Hayashida, Mami <mami.hayash...@uky.edu>
>>> wrote:
>>> >
>>> > I followed all the steps Hector suggested, and almost everything seems
>>> to have worked fine.  I say "almost" because one out of the 10 osds I was
>>> migrating could not be activated even though everything up to that point
>>> worked just as well for that osd as the other ones. Here is the output for
>>> that particular failure:
>>> >
>>> > *****
>>> > ceph-volume lvm activate --all
>>> > ...
>>> > --> Activating OSD ID 67 FSID 17cd6755-76f9-4160-906c-XXXXXX
>>> > Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-67
>>> > --> Absolute path not found for executable: restorecon
>>> > --> Ensure $PATH environment variable contains common executable
>>> locations
>>> > Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir
>>> --dev /dev/hdd67/data67 --path /var/lib/ceph/osd/ceph-67
>>> >  stderr: failed to read label for /dev/hdd67/data67: (2) No such file
>>> or directory
>>> > -->  RuntimeError: command returned non-zero exit status:
>>>
>>> I wonder if the /dev/sdo device where hdd67/data67 is located is
>>> available, or if something else is missing. You could try poking
>>> around with `lvs` and see if that LV shows up, also `ceph-volume lvm
>>> list hdd67/data67` can help here because it
>>> groups OSDs to LVs. If you run `ceph-volume lvm list --format=json
>>> hdd67/data67` you will also see all the metadata stored in it.
>>>
>>> Would be interesting to see that output to verify things exist and are
>>> usable for OSD activation.
>>>
>>> >
>>> > *******
>>> > I then checked to see if the rest of the migrated OSDs were back in by
>>> calling the ceph osd tree command from the admin node.  Since they were
>>> not, I tried to restart the first of the 10 newly migrated Bluestore osds
>>> by calling
>>> >
>>> > *******
>>> > systemctl start ceph-osd@60
>>> >
>>> > At that point, not only this particular service could not be started,
>>> but ALL the OSDs (daemons) on the entire node shut down!!!!!
>>> >
>>> > ******
>>> > root@osd1:~# systemctl status ceph-osd@60
>>> > ● ceph-osd@60.service - Ceph object storage daemon osd.60
>>> >    Loaded: loaded (/lib/systemd/system/ceph-osd@.service;
>>> enabled-runtime; vendor preset: enabled)
>>> >    Active: inactive (dead) since Fri 2018-11-02 15:47:20 EDT; 1h 9min
>>> ago
>>> >   Process: 3473621 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER}
>>> --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
>>> >   Process: 3473147 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
>>> --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>>> >  Main PID: 3473621 (code=exited, status=0/SUCCESS)
>>> >
>>> > Oct 29 15:57:53 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-29
>>> 15:57:53.868856 7f68adaece00 -1 osd.60 48106 log_to_monitors {default=true}
>>> > Oct 29 15:57:53 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-29
>>> 15:57:53.874373 7f68adaece00 -1 osd.60 48106 mon_cmd_maybe_osd_create fail:
>>> 'you must complete the upgrade and 'ceph osd require-osd-release luminous'
>>> before using crush device classes': (1) Operation not permitted
>>> > Oct 30 06:25:01 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-30
>>> 06:25:01.961720 7f687feb3700 -1 received  signal: Hangup from  PID: 3485955
>>> task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
>>> radosgw  UID: 0
>>> > Oct 31 06:25:02 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-31
>>> 06:25:02.110898 7f687feb3700 -1 received  signal: Hangup from  PID: 3500945
>>> task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
>>> radosgw  UID: 0
>>> > Nov 01 06:25:02 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-01
>>> 06:25:02.101548 7f687feb3700 -1 received  signal: Hangup from  PID: 3514774
>>> task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
>>> radosgw  UID: 0
>>> > Nov 02 06:25:02 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-02
>>> 06:25:01.997557 7f687feb3700 -1 received  signal: Hangup from  PID: 3528128
>>> task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
>>> radosgw  UID: 0
>>> > Nov 02 15:47:16 osd1.oxxxxx.uky.edu ceph-osd[3473621]: 2018-11-02
>>> 15:47:16.322229 7f687feb3700 -1 received  signal: Terminated from  PID: 1
>>> task name: /lib/systemd/systemd --system --deserialize 20  UID: 0
>>> > Nov 02 15:47:16 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-02
>>> 15:47:16.322253 7f687feb3700 -1 osd.60 48504 *** Got signal Terminated ***
>>> > Nov 02 15:47:16 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-02
>>> 15:47:16.676625 7f687feb3700 -1 osd.60 48504 shutdown
>>> > Nov 02 16:34:05 osd1.oxxxxx.uky.edu systemd[1]: Stopped Ceph object
>>> storage daemon osd.60.
>>> >
>>> > **********
>>> > And ere is the output for one of the OSDs (osd.70 still using
>>> Filestore) that shut down right when I tried to start osd.60
>>> >
>>> > ********
>>> >
>>> > root@osd1:~# systemctl status ceph-osd@70
>>> > ● ceph-osd@70.service - Ceph object storage daemon osd.70
>>> >    Loaded: loaded (/lib/systemd/system/ceph-osd@.service;
>>> enabled-runtime; vendor preset: enabled)
>>> >    Active: inactive (dead) since Fri 2018-11-02 16:34:08 EDT; 2min 6s
>>> ago
>>> >   Process: 3473629 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER}
>>> --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
>>> >   Process: 3473153 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
>>> --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>>> >  Main PID: 3473629 (code=exited, status=0/SUCCESS)
>>> >
>>> > Oct 29 15:57:51 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-10-29
>>> 15:57:51.300563 7f530eec2e00 -1 osd.70 pg_epoch: 48095 pg[68.ces1( empty
>>> local-lis/les=47489/47489 n=0 ec=6030/6030 lis/c 47488/47488 les/c/f
>>> 47489/47489/0 47485/47488/47488) [138,70,203]p138(0) r=1 lpr=0 crt=0'0
>>> unknown NO
>>> > Oct 30 06:25:01 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-10-30
>>> 06:25:01.961743 7f52d8e44700 -1 received  signal: Hangup from  PID: 3485955
>>> task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
>>> radosgw  UID: 0
>>> > Oct 31 06:25:02 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-10-31
>>> 06:25:02.110920 7f52d8e44700 -1 received  signal: Hangup from  PID: 3500945
>>> task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
>>> radosgw  UID: 0
>>> > Nov 01 06:25:02 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-01
>>> 06:25:02.101568 7f52d8e44700 -1 received  signal: Hangup from  PID: 3514774
>>> task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
>>> radosgw  UID: 0
>>> > Nov 02 06:25:02 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02
>>> 06:25:01.997633 7f52d8e44700 -1 received  signal: Hangup from  PID: 3528128
>>> task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse
>>> radosgw  UID: 0
>>> > Nov 02 16:34:05 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02
>>> 16:34:05.607714 7f52d8e44700 -1 received  signal: Terminated from  PID: 1
>>> task name: /lib/systemd/systemd --system --deserialize 20  UID: 0
>>> > Nov 02 16:34:05 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02
>>> 16:34:05.607738 7f52d8e44700 -1 osd.70 48535 *** Got signal Terminated ***
>>> > Nov 02 16:34:05 osd1.xxxx.uky.edu systemd[1]: Stopping Ceph object
>>> storage daemon osd.70...
>>> > Nov 02 16:34:05 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02
>>> 16:34:05.677348 7f52d8e44700 -1 osd.70 48535 shutdown
>>> > Nov 02 16:34:08 osd1.xxxx.uky.edu systemd[1]: Stopped Ceph object
>>> storage daemon osd.70.
>>> >
>>> > **************
>>> >
>>> > So, at this point, ALL the OSDs on that node have been shut down.
>>> >
>>> > For your information this is the output of lsblk command (selection)
>>> > *****
>>> > root@osd1:~# lsblk
>>> > NAME           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>>> > sda              8:0    0 447.1G  0 disk
>>> > ├─ssd0-db60    252:0    0    40G  0 lvm
>>> > ├─ssd0-db61    252:1    0    40G  0 lvm
>>> > ├─ssd0-db62    252:2    0    40G  0 lvm
>>> > ├─ssd0-db63    252:3    0    40G  0 lvm
>>> > ├─ssd0-db64    252:4    0    40G  0 lvm
>>> > ├─ssd0-db65    252:5    0    40G  0 lvm
>>> > ├─ssd0-db66    252:6    0    40G  0 lvm
>>> > ├─ssd0-db67    252:7    0    40G  0 lvm
>>> > ├─ssd0-db68    252:8    0    40G  0 lvm
>>> > └─ssd0-db69    252:9    0    40G  0 lvm
>>> > sdb              8:16   0 447.1G  0 disk
>>> > ├─sdb1           8:17   0    40G  0 part
>>> > ├─sdb2           8:18   0    40G  0 part
>>> >
>>> > .....
>>> >
>>> > sdh              8:112  0   3.7T  0 disk
>>> > └─hdd60-data60 252:10   0   3.7T  0 lvm
>>> > sdi              8:128  0   3.7T  0 disk
>>> > └─hdd61-data61 252:11   0   3.7T  0 lvm
>>> > sdj              8:144  0   3.7T  0 disk
>>> > └─hdd62-data62 252:12   0   3.7T  0 lvm
>>> > sdk              8:160  0   3.7T  0 disk
>>> > └─hdd63-data63 252:13   0   3.7T  0 lvm
>>> > sdl              8:176  0   3.7T  0 disk
>>> > └─hdd64-data64 252:14   0   3.7T  0 lvm
>>> > sdm              8:192  0   3.7T  0 disk
>>> > └─hdd65-data65 252:15   0   3.7T  0 lvm
>>> > sdn              8:208  0   3.7T  0 disk
>>> > └─hdd66-data66 252:16   0   3.7T  0 lvm
>>> > sdo              8:224  0   3.7T  0 disk
>>> > └─hdd67-data67 252:17   0   3.7T  0 lvm
>>> > sdp              8:240  0   3.7T  0 disk
>>> > └─hdd68-data68 252:18   0   3.7T  0 lvm
>>> > sdq             65:0    0   3.7T  0 disk
>>> > └─hdd69-data69 252:19   0   3.7T  0 lvm
>>> > sdr             65:16   0   3.7T  0 disk
>>> > └─sdr1          65:17   0   3.7T  0 part /var/lib/ceph/osd/ceph-70
>>> > .....
>>> >
>>> > As a Ceph novice, I am totally clueless about the next step at this
>>> point.  Any help would be appreciated.
>>> >
>>> > On Thu, Nov 1, 2018 at 3:16 PM, Hayashida, Mami <
>>> mami.hayash...@uky.edu> wrote:
>>> >>
>>> >> Thank you, both of you.  I will try this out very soon.
>>> >>
>>> >> On Wed, Oct 31, 2018 at 8:48 AM, Alfredo Deza <ad...@redhat.com>
>>> wrote:
>>> >>>
>>> >>> On Wed, Oct 31, 2018 at 8:28 AM Hayashida, Mami <
>>> mami.hayash...@uky.edu> wrote:
>>> >>> >
>>> >>> > Thank you for your replies. So, if I use the method Hector
>>> suggested (by creating PVs, VGs.... etc. first), can I add the --osd-id
>>> parameter to the command as in
>>> >>> >
>>> >>> > ceph-volume lvm prepare --bluestore --data hdd0/data0 --block.db
>>> ssd/db0  --osd-id 0
>>> >>> > ceph-volume lvm prepare --bluestore --data hdd1/data1 --block.db
>>> ssd/db1  --osd-id 1
>>> >>> >
>>> >>> > so that Filestore -> Bluestore migration will not change the osd
>>> ID on each disk?
>>> >>>
>>> >>> That looks correct.
>>> >>>
>>> >>> >
>>> >>> > And one more question.  Are there any changes I need to make to
>>> the ceph.conf file?  I did comment out this line that was probably used for
>>> creating Filestore (using ceph-deploy):  osd journal size = 40960
>>> >>>
>>> >>> Since you've pre-created the LVs the commented out line will not
>>> >>> affect anything.
>>> >>>
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > On Wed, Oct 31, 2018 at 7:03 AM, Alfredo Deza <ad...@redhat.com>
>>> wrote:
>>> >>> >>
>>> >>> >> On Wed, Oct 31, 2018 at 5:22 AM Hector Martin <
>>> hec...@marcansoft.com> wrote:
>>> >>> >> >
>>> >>> >> > On 31/10/2018 05:55, Hayashida, Mami wrote:
>>> >>> >> > > I am relatively new to Ceph and need some advice on Bluestore
>>> migration.
>>> >>> >> > > I tried migrating a few of our test cluster nodes from
>>> Filestore to
>>> >>> >> > > Bluestore by following this
>>> >>> >> > > (
>>> http://docs.ceph.com/docs/luminous/rados/operations/bluestore-migration/
>>> )
>>> >>> >> > > as the cluster is currently running 12.2.9. The cluster,
>>> originally set
>>> >>> >> > > up by my predecessors, was running Jewel until I upgraded it
>>> recently to
>>> >>> >> > > Luminous.
>>> >>> >> > >
>>> >>> >> > > OSDs in each OSD host is set up in such a way that for ever
>>> 10 data HDD
>>> >>> >> > > disks, there is one SSD drive that is holding their
>>> journals.  For
>>> >>> >> > > example, osd.0 data is on /dev/sdh and its Filestore journal
>>> is on a
>>> >>> >> > > partitioned part of /dev/sda. So, lsblk shows something like
>>> >>> >> > >
>>> >>> >> > > sda       8:0    0 447.1G  0 disk
>>> >>> >> > > ├─sda1    8:1    0    40G  0 part # journal for osd.0
>>> >>> >> > >
>>> >>> >> > > sdh       8:112  0   3.7T  0 disk
>>> >>> >> > > └─sdh1    8:113  0   3.7T  0 part /var/lib/ceph/osd/ceph-0
>>> >>> >> > >
>>> >>> >> >
>>> >>> >> > The BlueStore documentation states that the wal will
>>> automatically use
>>> >>> >> > the db volume if it fits, so if you're using a single SSD I
>>> think
>>> >>> >> > there's no good reason to split out the wal, if I'm
>>> understanding it
>>> >>> >> > correctly.
>>> >>> >>
>>> >>> >> This is correct, no need for wal in this case.
>>> >>> >>
>>> >>> >> >
>>> >>> >> > You should be using ceph-volume, since ceph-disk is deprecated.
>>> If
>>> >>> >> > you're sharing the SSD as wal/db for a bunch of OSDs, I think
>>> you're
>>> >>> >> > going to have to create the LVs yourself first. The data HDDs
>>> should be
>>> >>> >> > PVs (I don't think it matters if they're partitions or whole
>>> disk PVs as
>>> >>> >> > long as LVM discovers them) each part of a separate VG (e.g.
>>> hdd0-hdd9)
>>> >>> >> > containing a single LV. Then the SSD should itself be an LV for
>>> a
>>> >>> >> > separate shared SSD VG (e.g. ssd).
>>> >>> >> >
>>> >>> >> > So something like (assuming sda is your wal SSD and sdb and
>>> onwards are
>>> >>> >> > your OSD HDDs):
>>> >>> >> > pvcreate /dev/sda
>>> >>> >> > pvcreate /dev/sdb
>>> >>> >> > pvcreate /dev/sdc
>>> >>> >> > ...
>>> >>> >> >
>>> >>> >> > vgcreate ssd /dev/sda
>>> >>> >> > vgcreate hdd0 /dev/sdb
>>> >>> >> > vgcreate hdd1 /dev/sdc
>>> >>> >> > ...
>>> >>> >> >
>>> >>> >> > lvcreate -L 40G -n db0 ssd
>>> >>> >> > lvcreate -L 40G -n db1 ssd
>>> >>> >> > ...
>>> >>> >> >
>>> >>> >> > lvcreate -L 100%VG -n data0 hdd0
>>> >>> >> > lvcreate -L 100%VG -n data1 hdd1
>>> >>> >> > ...
>>> >>> >> >
>>> >>> >> > ceph-volume lvm prepare --bluestore --data hdd0/data0
>>> --block.db ssd/db0
>>> >>> >> > ceph-volume lvm prepare --bluestore --data hdd1/data1
>>> --block.db ssd/db1
>>> >>> >> > ...
>>> >>> >> >
>>> >>> >> > ceph-volume lvm activate --all
>>> >>> >> >
>>> >>> >> > I think it might be possible to just let ceph-volume create the
>>> PV/VG/LV
>>> >>> >> > for the data disks and only manually create the DB LVs, but it
>>> shouldn't
>>> >>> >> > hurt to do it on your own and just give ready-made LVs to
>>> ceph-volume
>>> >>> >> > for everything.
>>> >>> >>
>>> >>> >> Another alternative here is to use the new `lvm batch` subcommand
>>> to
>>> >>> >> do all of this in one go:
>>> >>> >>
>>> >>> >> ceph-volume lvm batch /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde
>>> >>> >> /dev/sdf /dev/sdg /dev/sdh
>>> >>> >>
>>> >>> >> Will detect that sda is an SSD and will create the LVs for you for
>>> >>> >> block.db (one for each spinning disk). For each spinning disk, it
>>> will
>>> >>> >> place data on them.
>>> >>> >>
>>> >>> >> The one caveat is that you no longer control OSD IDs, and they are
>>> >>> >> created with whatever the monitors are giving out.
>>> >>> >>
>>> >>> >> This operation is not supported from ceph-deploy either.
>>> >>> >> >
>>> >>> >> > --
>>> >>> >> > Hector Martin (hec...@marcansoft.com)
>>> >>> >> > Public Key: https://marcan.st/marcan.asc
>>> >>> >> > _______________________________________________
>>> >>> >> > ceph-users mailing list
>>> >>> >> > ceph-users@lists.ceph.com
>>> >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > Mami Hayashida
>>> >>> > Research Computing Associate
>>> >>> >
>>> >>> > Research Computing Infrastructure
>>> >>> > University of Kentucky Information Technology Services
>>> >>> > 301 Rose Street | 102 James F. Hardymon Building
>>> >>> > Lexington, KY 40506-0495
>>> >>> > mami.hayash...@uky.edu
>>> >>> > (859)323-7521
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Mami Hayashida
>>> >> Research Computing Associate
>>> >>
>>> >> Research Computing Infrastructure
>>> >> University of Kentucky Information Technology Services
>>> >> 301 Rose Street | 102 James F. Hardymon Building
>>> >> Lexington, KY 40506-0495
>>> >> mami.hayash...@uky.edu
>>> >> (859)323-7521
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Mami Hayashida
>>> > Research Computing Associate
>>> >
>>> > Research Computing Infrastructure
>>> > University of Kentucky Information Technology Services
>>> > 301 Rose Street | 102 James F. Hardymon Building
>>> > Lexington, KY 40506-0495
>>> > mami.hayash...@uky.edu
>>> > (859)323-7521
>>>
>>
>>
>>
>> --
>> *Mami Hayashida*
>>
>> *Research Computing Associate*
>> Research Computing Infrastructure
>> University of Kentucky Information Technology Services
>> 301 Rose Street | 102 James F. Hardymon Building
>> Lexington, KY 40506-0495
>> mami.hayash...@uky.edu
>> (859)323-7521
>>
>
>
>
> --
> *Mami Hayashida*
>
> *Research Computing Associate*
> Research Computing Infrastructure
> University of Kentucky Information Technology Services
> 301 Rose Street | 102 James F. Hardymon Building
> Lexington, KY 40506-0495
> mami.hayash...@uky.edu
> (859)323-7521
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Filestore to Bluestore migration question

Reply via email to