On Mon, Nov 5, 2018 at 10:43 AM Hayashida, Mami <mami.hayash...@uky.edu> wrote: > > Additional info -- I know that /var/lib/ceph/osd/ceph-{60..69} are not > mounted at this point (i.e. mount | grep ceph-60, and 61-69, returns > nothing.). They don't show up when I run "df", either. > > On Mon, Nov 5, 2018 at 10:15 AM, Hayashida, Mami <mami.hayash...@uky.edu> > wrote: >> >> Well, over the weekend the whole server went down and is now in the >> emergency mode. (I am running Ubuntu 16.04). When I run "journalctl -p err >> -xb" I see that >> >> systemd[1]: Timed out waiting for device dev-sdh1.device. >> -- Subject: Unit dev-sdh1.device has failed >> -- Defined-By: systemd >> -- Support: http://lists.freeddesktop.org/.... >> -- >> -- Unit dev-sdh1.device has failed. >> >> >> I see this for every single one of the newly-converted Bluestore OSD disks >> (/dev/sd{h..q}1).
This will happen with stale ceph-disk systemd units. You can disable those with: ln -sf /dev/null /etc/systemd/system/ceph-disk@.service >> >> >> -- >> >> On Mon, Nov 5, 2018 at 9:57 AM, Alfredo Deza <ad...@redhat.com> wrote: >>> >>> On Fri, Nov 2, 2018 at 5:04 PM Hayashida, Mami <mami.hayash...@uky.edu> >>> wrote: >>> > >>> > I followed all the steps Hector suggested, and almost everything seems to >>> > have worked fine. I say "almost" because one out of the 10 osds I was >>> > migrating could not be activated even though everything up to that point >>> > worked just as well for that osd as the other ones. Here is the output >>> > for that particular failure: >>> > >>> > ***** >>> > ceph-volume lvm activate --all >>> > ... >>> > --> Activating OSD ID 67 FSID 17cd6755-76f9-4160-906c-XXXXXX >>> > Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-67 >>> > --> Absolute path not found for executable: restorecon >>> > --> Ensure $PATH environment variable contains common executable locations >>> > Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev >>> > /dev/hdd67/data67 --path /var/lib/ceph/osd/ceph-67 >>> > stderr: failed to read label for /dev/hdd67/data67: (2) No such file or >>> > directory >>> > --> RuntimeError: command returned non-zero exit status: >>> >>> I wonder if the /dev/sdo device where hdd67/data67 is located is >>> available, or if something else is missing. You could try poking >>> around with `lvs` and see if that LV shows up, also `ceph-volume lvm >>> list hdd67/data67` can help here because it >>> groups OSDs to LVs. If you run `ceph-volume lvm list --format=json >>> hdd67/data67` you will also see all the metadata stored in it. >>> >>> Would be interesting to see that output to verify things exist and are >>> usable for OSD activation. >>> >>> > >>> > ******* >>> > I then checked to see if the rest of the migrated OSDs were back in by >>> > calling the ceph osd tree command from the admin node. Since they were >>> > not, I tried to restart the first of the 10 newly migrated Bluestore osds >>> > by calling >>> > >>> > ******* >>> > systemctl start ceph-osd@60 >>> > >>> > At that point, not only this particular service could not be started, but >>> > ALL the OSDs (daemons) on the entire node shut down!!!!! >>> > >>> > ****** >>> > root@osd1:~# systemctl status ceph-osd@60 >>> > ● ceph-osd@60.service - Ceph object storage daemon osd.60 >>> > Loaded: loaded (/lib/systemd/system/ceph-osd@.service; >>> > enabled-runtime; vendor preset: enabled) >>> > Active: inactive (dead) since Fri 2018-11-02 15:47:20 EDT; 1h 9min ago >>> > Process: 3473621 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} >>> > --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS) >>> > Process: 3473147 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh >>> > --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) >>> > Main PID: 3473621 (code=exited, status=0/SUCCESS) >>> > >>> > Oct 29 15:57:53 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-29 >>> > 15:57:53.868856 7f68adaece00 -1 osd.60 48106 log_to_monitors >>> > {default=true} >>> > Oct 29 15:57:53 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-29 >>> > 15:57:53.874373 7f68adaece00 -1 osd.60 48106 mon_cmd_maybe_osd_create >>> > fail: 'you must complete the upgrade and 'ceph osd require-osd-release >>> > luminous' before using crush device classes': (1) Operation not permitted >>> > Oct 30 06:25:01 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-30 >>> > 06:25:01.961720 7f687feb3700 -1 received signal: Hangup from PID: >>> > 3485955 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd >>> > ceph-fuse radosgw UID: 0 >>> > Oct 31 06:25:02 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-31 >>> > 06:25:02.110898 7f687feb3700 -1 received signal: Hangup from PID: >>> > 3500945 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd >>> > ceph-fuse radosgw UID: 0 >>> > Nov 01 06:25:02 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-01 >>> > 06:25:02.101548 7f687feb3700 -1 received signal: Hangup from PID: >>> > 3514774 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd >>> > ceph-fuse radosgw UID: 0 >>> > Nov 02 06:25:02 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-02 >>> > 06:25:01.997557 7f687feb3700 -1 received signal: Hangup from PID: >>> > 3528128 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd >>> > ceph-fuse radosgw UID: 0 >>> > Nov 02 15:47:16 osd1.oxxxxx.uky.edu ceph-osd[3473621]: 2018-11-02 >>> > 15:47:16.322229 7f687feb3700 -1 received signal: Terminated from PID: 1 >>> > task name: /lib/systemd/systemd --system --deserialize 20 UID: 0 >>> > Nov 02 15:47:16 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-02 >>> > 15:47:16.322253 7f687feb3700 -1 osd.60 48504 *** Got signal Terminated *** >>> > Nov 02 15:47:16 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-02 >>> > 15:47:16.676625 7f687feb3700 -1 osd.60 48504 shutdown >>> > Nov 02 16:34:05 osd1.oxxxxx.uky.edu systemd[1]: Stopped Ceph object >>> > storage daemon osd.60. >>> > >>> > ********** >>> > And ere is the output for one of the OSDs (osd.70 still using Filestore) >>> > that shut down right when I tried to start osd.60 >>> > >>> > ******** >>> > >>> > root@osd1:~# systemctl status ceph-osd@70 >>> > ● ceph-osd@70.service - Ceph object storage daemon osd.70 >>> > Loaded: loaded (/lib/systemd/system/ceph-osd@.service; >>> > enabled-runtime; vendor preset: enabled) >>> > Active: inactive (dead) since Fri 2018-11-02 16:34:08 EDT; 2min 6s ago >>> > Process: 3473629 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} >>> > --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS) >>> > Process: 3473153 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh >>> > --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) >>> > Main PID: 3473629 (code=exited, status=0/SUCCESS) >>> > >>> > Oct 29 15:57:51 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-10-29 >>> > 15:57:51.300563 7f530eec2e00 -1 osd.70 pg_epoch: 48095 pg[68.ces1( empty >>> > local-lis/les=47489/47489 n=0 ec=6030/6030 lis/c 47488/47488 les/c/f >>> > 47489/47489/0 47485/47488/47488) [138,70,203]p138(0) r=1 lpr=0 crt=0'0 >>> > unknown NO >>> > Oct 30 06:25:01 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-10-30 >>> > 06:25:01.961743 7f52d8e44700 -1 received signal: Hangup from PID: >>> > 3485955 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd >>> > ceph-fuse radosgw UID: 0 >>> > Oct 31 06:25:02 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-10-31 >>> > 06:25:02.110920 7f52d8e44700 -1 received signal: Hangup from PID: >>> > 3500945 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd >>> > ceph-fuse radosgw UID: 0 >>> > Nov 01 06:25:02 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-01 >>> > 06:25:02.101568 7f52d8e44700 -1 received signal: Hangup from PID: >>> > 3514774 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd >>> > ceph-fuse radosgw UID: 0 >>> > Nov 02 06:25:02 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02 >>> > 06:25:01.997633 7f52d8e44700 -1 received signal: Hangup from PID: >>> > 3528128 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd >>> > ceph-fuse radosgw UID: 0 >>> > Nov 02 16:34:05 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02 >>> > 16:34:05.607714 7f52d8e44700 -1 received signal: Terminated from PID: 1 >>> > task name: /lib/systemd/systemd --system --deserialize 20 UID: 0 >>> > Nov 02 16:34:05 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02 >>> > 16:34:05.607738 7f52d8e44700 -1 osd.70 48535 *** Got signal Terminated *** >>> > Nov 02 16:34:05 osd1.xxxx.uky.edu systemd[1]: Stopping Ceph object >>> > storage daemon osd.70... >>> > Nov 02 16:34:05 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02 >>> > 16:34:05.677348 7f52d8e44700 -1 osd.70 48535 shutdown >>> > Nov 02 16:34:08 osd1.xxxx.uky.edu systemd[1]: Stopped Ceph object storage >>> > daemon osd.70. >>> > >>> > ************** >>> > >>> > So, at this point, ALL the OSDs on that node have been shut down. >>> > >>> > For your information this is the output of lsblk command (selection) >>> > ***** >>> > root@osd1:~# lsblk >>> > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT >>> > sda 8:0 0 447.1G 0 disk >>> > ├─ssd0-db60 252:0 0 40G 0 lvm >>> > ├─ssd0-db61 252:1 0 40G 0 lvm >>> > ├─ssd0-db62 252:2 0 40G 0 lvm >>> > ├─ssd0-db63 252:3 0 40G 0 lvm >>> > ├─ssd0-db64 252:4 0 40G 0 lvm >>> > ├─ssd0-db65 252:5 0 40G 0 lvm >>> > ├─ssd0-db66 252:6 0 40G 0 lvm >>> > ├─ssd0-db67 252:7 0 40G 0 lvm >>> > ├─ssd0-db68 252:8 0 40G 0 lvm >>> > └─ssd0-db69 252:9 0 40G 0 lvm >>> > sdb 8:16 0 447.1G 0 disk >>> > ├─sdb1 8:17 0 40G 0 part >>> > ├─sdb2 8:18 0 40G 0 part >>> > >>> > ..... >>> > >>> > sdh 8:112 0 3.7T 0 disk >>> > └─hdd60-data60 252:10 0 3.7T 0 lvm >>> > sdi 8:128 0 3.7T 0 disk >>> > └─hdd61-data61 252:11 0 3.7T 0 lvm >>> > sdj 8:144 0 3.7T 0 disk >>> > └─hdd62-data62 252:12 0 3.7T 0 lvm >>> > sdk 8:160 0 3.7T 0 disk >>> > └─hdd63-data63 252:13 0 3.7T 0 lvm >>> > sdl 8:176 0 3.7T 0 disk >>> > └─hdd64-data64 252:14 0 3.7T 0 lvm >>> > sdm 8:192 0 3.7T 0 disk >>> > └─hdd65-data65 252:15 0 3.7T 0 lvm >>> > sdn 8:208 0 3.7T 0 disk >>> > └─hdd66-data66 252:16 0 3.7T 0 lvm >>> > sdo 8:224 0 3.7T 0 disk >>> > └─hdd67-data67 252:17 0 3.7T 0 lvm >>> > sdp 8:240 0 3.7T 0 disk >>> > └─hdd68-data68 252:18 0 3.7T 0 lvm >>> > sdq 65:0 0 3.7T 0 disk >>> > └─hdd69-data69 252:19 0 3.7T 0 lvm >>> > sdr 65:16 0 3.7T 0 disk >>> > └─sdr1 65:17 0 3.7T 0 part /var/lib/ceph/osd/ceph-70 >>> > ..... >>> > >>> > As a Ceph novice, I am totally clueless about the next step at this >>> > point. Any help would be appreciated. >>> > >>> > On Thu, Nov 1, 2018 at 3:16 PM, Hayashida, Mami <mami.hayash...@uky.edu> >>> > wrote: >>> >> >>> >> Thank you, both of you. I will try this out very soon. >>> >> >>> >> On Wed, Oct 31, 2018 at 8:48 AM, Alfredo Deza <ad...@redhat.com> wrote: >>> >>> >>> >>> On Wed, Oct 31, 2018 at 8:28 AM Hayashida, Mami >>> >>> <mami.hayash...@uky.edu> wrote: >>> >>> > >>> >>> > Thank you for your replies. So, if I use the method Hector suggested >>> >>> > (by creating PVs, VGs.... etc. first), can I add the --osd-id >>> >>> > parameter to the command as in >>> >>> > >>> >>> > ceph-volume lvm prepare --bluestore --data hdd0/data0 --block.db >>> >>> > ssd/db0 --osd-id 0 >>> >>> > ceph-volume lvm prepare --bluestore --data hdd1/data1 --block.db >>> >>> > ssd/db1 --osd-id 1 >>> >>> > >>> >>> > so that Filestore -> Bluestore migration will not change the osd ID >>> >>> > on each disk? >>> >>> >>> >>> That looks correct. >>> >>> >>> >>> > >>> >>> > And one more question. Are there any changes I need to make to the >>> >>> > ceph.conf file? I did comment out this line that was probably used >>> >>> > for creating Filestore (using ceph-deploy): osd journal size = 40960 >>> >>> >>> >>> Since you've pre-created the LVs the commented out line will not >>> >>> affect anything. >>> >>> >>> >>> > >>> >>> > >>> >>> > >>> >>> > On Wed, Oct 31, 2018 at 7:03 AM, Alfredo Deza <ad...@redhat.com> >>> >>> > wrote: >>> >>> >> >>> >>> >> On Wed, Oct 31, 2018 at 5:22 AM Hector Martin >>> >>> >> <hec...@marcansoft.com> wrote: >>> >>> >> > >>> >>> >> > On 31/10/2018 05:55, Hayashida, Mami wrote: >>> >>> >> > > I am relatively new to Ceph and need some advice on Bluestore >>> >>> >> > > migration. >>> >>> >> > > I tried migrating a few of our test cluster nodes from Filestore >>> >>> >> > > to >>> >>> >> > > Bluestore by following this >>> >>> >> > > (http://docs.ceph.com/docs/luminous/rados/operations/bluestore-migration/) >>> >>> >> > > as the cluster is currently running 12.2.9. The cluster, >>> >>> >> > > originally set >>> >>> >> > > up by my predecessors, was running Jewel until I upgraded it >>> >>> >> > > recently to >>> >>> >> > > Luminous. >>> >>> >> > > >>> >>> >> > > OSDs in each OSD host is set up in such a way that for ever 10 >>> >>> >> > > data HDD >>> >>> >> > > disks, there is one SSD drive that is holding their journals. >>> >>> >> > > For >>> >>> >> > > example, osd.0 data is on /dev/sdh and its Filestore journal is >>> >>> >> > > on a >>> >>> >> > > partitioned part of /dev/sda. So, lsblk shows something like >>> >>> >> > > >>> >>> >> > > sda 8:0 0 447.1G 0 disk >>> >>> >> > > ├─sda1 8:1 0 40G 0 part # journal for osd.0 >>> >>> >> > > >>> >>> >> > > sdh 8:112 0 3.7T 0 disk >>> >>> >> > > └─sdh1 8:113 0 3.7T 0 part /var/lib/ceph/osd/ceph-0 >>> >>> >> > > >>> >>> >> > >>> >>> >> > The BlueStore documentation states that the wal will automatically >>> >>> >> > use >>> >>> >> > the db volume if it fits, so if you're using a single SSD I think >>> >>> >> > there's no good reason to split out the wal, if I'm understanding >>> >>> >> > it >>> >>> >> > correctly. >>> >>> >> >>> >>> >> This is correct, no need for wal in this case. >>> >>> >> >>> >>> >> > >>> >>> >> > You should be using ceph-volume, since ceph-disk is deprecated. If >>> >>> >> > you're sharing the SSD as wal/db for a bunch of OSDs, I think >>> >>> >> > you're >>> >>> >> > going to have to create the LVs yourself first. The data HDDs >>> >>> >> > should be >>> >>> >> > PVs (I don't think it matters if they're partitions or whole disk >>> >>> >> > PVs as >>> >>> >> > long as LVM discovers them) each part of a separate VG (e.g. >>> >>> >> > hdd0-hdd9) >>> >>> >> > containing a single LV. Then the SSD should itself be an LV for a >>> >>> >> > separate shared SSD VG (e.g. ssd). >>> >>> >> > >>> >>> >> > So something like (assuming sda is your wal SSD and sdb and >>> >>> >> > onwards are >>> >>> >> > your OSD HDDs): >>> >>> >> > pvcreate /dev/sda >>> >>> >> > pvcreate /dev/sdb >>> >>> >> > pvcreate /dev/sdc >>> >>> >> > ... >>> >>> >> > >>> >>> >> > vgcreate ssd /dev/sda >>> >>> >> > vgcreate hdd0 /dev/sdb >>> >>> >> > vgcreate hdd1 /dev/sdc >>> >>> >> > ... >>> >>> >> > >>> >>> >> > lvcreate -L 40G -n db0 ssd >>> >>> >> > lvcreate -L 40G -n db1 ssd >>> >>> >> > ... >>> >>> >> > >>> >>> >> > lvcreate -L 100%VG -n data0 hdd0 >>> >>> >> > lvcreate -L 100%VG -n data1 hdd1 >>> >>> >> > ... >>> >>> >> > >>> >>> >> > ceph-volume lvm prepare --bluestore --data hdd0/data0 --block.db >>> >>> >> > ssd/db0 >>> >>> >> > ceph-volume lvm prepare --bluestore --data hdd1/data1 --block.db >>> >>> >> > ssd/db1 >>> >>> >> > ... >>> >>> >> > >>> >>> >> > ceph-volume lvm activate --all >>> >>> >> > >>> >>> >> > I think it might be possible to just let ceph-volume create the >>> >>> >> > PV/VG/LV >>> >>> >> > for the data disks and only manually create the DB LVs, but it >>> >>> >> > shouldn't >>> >>> >> > hurt to do it on your own and just give ready-made LVs to >>> >>> >> > ceph-volume >>> >>> >> > for everything. >>> >>> >> >>> >>> >> Another alternative here is to use the new `lvm batch` subcommand to >>> >>> >> do all of this in one go: >>> >>> >> >>> >>> >> ceph-volume lvm batch /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde >>> >>> >> /dev/sdf /dev/sdg /dev/sdh >>> >>> >> >>> >>> >> Will detect that sda is an SSD and will create the LVs for you for >>> >>> >> block.db (one for each spinning disk). For each spinning disk, it >>> >>> >> will >>> >>> >> place data on them. >>> >>> >> >>> >>> >> The one caveat is that you no longer control OSD IDs, and they are >>> >>> >> created with whatever the monitors are giving out. >>> >>> >> >>> >>> >> This operation is not supported from ceph-deploy either. >>> >>> >> > >>> >>> >> > -- >>> >>> >> > Hector Martin (hec...@marcansoft.com) >>> >>> >> > Public Key: https://marcan.st/marcan.asc >>> >>> >> > _______________________________________________ >>> >>> >> > ceph-users mailing list >>> >>> >> > ceph-users@lists.ceph.com >>> >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > -- >>> >>> > Mami Hayashida >>> >>> > Research Computing Associate >>> >>> > >>> >>> > Research Computing Infrastructure >>> >>> > University of Kentucky Information Technology Services >>> >>> > 301 Rose Street | 102 James F. Hardymon Building >>> >>> > Lexington, KY 40506-0495 >>> >>> > mami.hayash...@uky.edu >>> >>> > (859)323-7521 >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Mami Hayashida >>> >> Research Computing Associate >>> >> >>> >> Research Computing Infrastructure >>> >> University of Kentucky Information Technology Services >>> >> 301 Rose Street | 102 James F. Hardymon Building >>> >> Lexington, KY 40506-0495 >>> >> mami.hayash...@uky.edu >>> >> (859)323-7521 >>> > >>> > >>> > >>> > >>> > -- >>> > Mami Hayashida >>> > Research Computing Associate >>> > >>> > Research Computing Infrastructure >>> > University of Kentucky Information Technology Services >>> > 301 Rose Street | 102 James F. Hardymon Building >>> > Lexington, KY 40506-0495 >>> > mami.hayash...@uky.edu >>> > (859)323-7521 >> >> >> >> >> -- >> Mami Hayashida >> Research Computing Associate >> >> Research Computing Infrastructure >> University of Kentucky Information Technology Services >> 301 Rose Street | 102 James F. Hardymon Building >> Lexington, KY 40506-0495 >> mami.hayash...@uky.edu >> (859)323-7521 > > > > > -- > Mami Hayashida > Research Computing Associate > > Research Computing Infrastructure > University of Kentucky Information Technology Services > 301 Rose Street | 102 James F. Hardymon Building > Lexington, KY 40506-0495 > mami.hayash...@uky.edu > (859)323-7521 _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com