>From what I observed, however, until I made that last change in the UDEV rule, I simply could not get those OSDs started. I will try converting the next 10 OSDs (osd.70-79) tomorrow, following all the steps you have shown me in in this email thread, and will report back to you guys if/where I encounter any errors. I am planning on trying to start the OSDs (once they are converted to Bluestore) without the udev rule first.
On Mon, Nov 5, 2018 at 4:42 PM, Alfredo Deza <ad...@redhat.com> wrote: > On Mon, Nov 5, 2018 at 4:21 PM Hayashida, Mami <mami.hayash...@uky.edu> > wrote: > > > > Yes, I still have the volume log showing the activation process for > ssd0/db60 (and 61-69 as well). I will email it to you directly as an > attachment. > > In the logs, I see that ceph-volume does set the permissions correctly: > > [2018-11-02 16:20:07,238][ceph_volume.process][INFO ] Running > command: chown -h ceph:ceph /dev/hdd60/data60 > [2018-11-02 16:20:07,242][ceph_volume.process][INFO ] Running > command: chown -R ceph:ceph /dev/dm-10 > [2018-11-02 16:20:07,246][ceph_volume.process][INFO ] Running > command: ln -s /dev/hdd60/data60 /var/lib/ceph/osd/ceph-60/block > [2018-11-02 16:20:07,249][ceph_volume.process][INFO ] Running > command: ceph --cluster ceph --name client.bootstrap-osd --keyring > /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o > /var/lib/ceph/osd/ceph-60/activate.monmap > [2018-11-02 16:20:07,530][ceph_volume.process][INFO ] stderr got monmap > epoch 2 > [2018-11-02 16:20:07,547][ceph_volume.process][INFO ] Running > command: ceph-authtool /var/lib/ceph/osd/ceph-60/keyring > --create-keyring --name osd.60 --add-key > AQBysdxbNgdBNhAA6NQ/UWDHqGAZfFuryCWfxQ== > [2018-11-02 16:20:07,579][ceph_volume.process][INFO ] stdout creating > /var/lib/ceph/osd/ceph-60/keyring > added entity osd.60 auth auth(auid = 18446744073709551615 > key=AQBysdxbNgdBNhAA6NQ/UWDHqGAZfFuryCWfxQ== with 0 caps) > [2018-11-02 16:20:07,583][ceph_volume.process][INFO ] Running > command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-60/keyring > [2018-11-02 16:20:07,587][ceph_volume.process][INFO ] Running > command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-60/ > [2018-11-02 16:20:07,591][ceph_volume.process][INFO ] Running > command: chown -h ceph:ceph /dev/ssd0/db60 > [2018-11-02 16:20:07,594][ceph_volume.process][INFO ] Running > command: chown -R ceph:ceph /dev/dm-0 > > And the failures from osd.60 are *before* those successful chown calls > (15:39:00). I wonder if somehow in the process there was a missing > step and then it all got corrected. I am certain that the UDEV rule > should *not* > be in place for this to work. > > The changes in the path for /dev/dm-* is expected, as that is created > every time the system boots. > > > > > > > On Mon, Nov 5, 2018 at 4:14 PM, Alfredo Deza <ad...@redhat.com> wrote: > >> > >> On Mon, Nov 5, 2018 at 4:04 PM Hayashida, Mami <mami.hayash...@uky.edu> > wrote: > >> > > >> > WOW. With you two guiding me through every step, the 10 OSDs in > question are now added back to the cluster as Bluestore disks!!! Here are > my responses to the last email from Hector: > >> > > >> > 1. I first checked the permissions and they looked like this > >> > > >> > root@osd1:/var/lib/ceph/osd/ceph-60# ls -l > >> > total 56 > >> > -rw-r--r-- 1 ceph ceph 384 Nov 2 16:20 activate.monmap > >> > -rw-r--r-- 1 ceph ceph 10737418240 Nov 2 16:20 block > >> > lrwxrwxrwx 1 ceph ceph 14 Nov 2 16:20 block.db -> > /dev/ssd0/db60 > >> > > >> > root@osd1:~# ls -l /dev/ssd0/ > >> > ... > >> > lrwxrwxrwx 1 root root 7 Nov 5 12:38 db60 -> ../dm-2 > >> > > >> > root@osd1:~# ls -la /dev/ > >> > ... > >> > brw-rw---- 1 root disk 252, 2 Nov 5 12:38 dm-2 > >> > >> This looks like a bug. You mentioned you are running 12.2.9, and we > >> haven't seen problems in ceph-volume that fail to update the > >> permissions on OSD devices. No one should need a UDEV rule to set the > >> permissions for > >> devices, this is a ceph-volume task. > >> > >> When a system starts and the OSD activation happens, it always ensures > >> that the permissions are set correctly. Could you find the section of > >> the logs in /var/log/ceph/ceph-volume.log that shows the activation > >> process for ssd0/db60 ? > >> > >> Hopefully you still have those around, it would help us determine why > >> the permissions aren't being set correctly. > >> > >> > ... > >> > > >> > 2. I then ran ceph-volume activate --all again. Saw the same error > for osd.67 I described many emails ago.. None of the permissions changed. > I tried restarting ceph-osd@60, but got the same error as before: > >> > > >> > 2018-11-05 15:34:52.001782 7f5a15744e00 0 set uid:gid to 64045:64045 > (ceph:ceph) > >> > 2018-11-05 15:34:52.001808 7f5a15744e00 0 ceph version 12.2.9 ( > 9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable), process > ceph-osd, pid 36506 > >> > 2018-11-05 15:34:52.021717 7f5a15744e00 0 pidfile_write: ignore > empty --pid-file > >> > 2018-11-05 15:34:52.033478 7f5a15744e00 0 load: jerasure load: lrc > load: isa > >> > 2018-11-05 15:34:52.033557 7f5a15744e00 1 bdev create path > /var/lib/ceph/osd/ceph-60/block type kernel > >> > 2018-11-05 15:34:52.033572 7f5a15744e00 1 bdev(0x5651bd1b8d80 > /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block > >> > 2018-11-05 15:34:52.033888 7f5a15744e00 1 bdev(0x5651bd1b8d80 > /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, > 10GiB) block_size 4096 (4KiB) rotational > >> > 2018-11-05 15:34:52.033958 7f5a15744e00 1 > bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size > 1073741824 meta 0.4 kv 0.4 data 0.2 > >> > 2018-11-05 15:34:52.033984 7f5a15744e00 1 bdev(0x5651bd1b8d80 > /var/lib/ceph/osd/ceph-60/block) close > >> > 2018-11-05 15:34:52.318993 7f5a15744e00 1 > bluestore(/var/lib/ceph/osd/ceph-60) _mount path /var/lib/ceph/osd/ceph-60 > >> > 2018-11-05 15:34:52.319064 7f5a15744e00 1 bdev create path > /var/lib/ceph/osd/ceph-60/block type kernel > >> > 2018-11-05 15:34:52.319073 7f5a15744e00 1 bdev(0x5651bd1b8fc0 > /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block > >> > 2018-11-05 15:34:52.319356 7f5a15744e00 1 bdev(0x5651bd1b8fc0 > /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, > 10GiB) block_size 4096 (4KiB) rotational > >> > 2018-11-05 15:34:52.319415 7f5a15744e00 1 > bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size > 1073741824 meta 0.4 kv 0.4 data 0.2 > >> > 2018-11-05 15:34:52.319491 7f5a15744e00 1 bdev create path > /var/lib/ceph/osd/ceph-60/block.db type kernel > >> > 2018-11-05 15:34:52.319499 7f5a15744e00 1 bdev(0x5651bd1b9200 > /var/lib/ceph/osd/ceph-60/block.db) open path /var/lib/ceph/osd/ceph-60/ > block.db > >> > 2018-11-05 15:34:52.319514 7f5a15744e00 -1 bdev(0x5651bd1b9200 > /var/lib/ceph/osd/ceph-60/block.db) open open got: (13) Permission denied > >> > 2018-11-05 15:34:52.319648 7f5a15744e00 -1 > bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block > device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission > denied > >> > 2018-11-05 15:34:52.319666 7f5a15744e00 1 bdev(0x5651bd1b8fc0 > /var/lib/ceph/osd/ceph-60/block) close > >> > 2018-11-05 15:34:52.598249 7f5a15744e00 -1 osd.60 0 OSD:init: unable > to mount object store > >> > 2018-11-05 15:34:52.598269 7f5a15744e00 -1 ** ERROR: osd init > failed: (13) Permission denied > >> > > >> > 3. Finally, I literally copied and pasted the udev rule Hector wrote > out for me, then rebooted the server. > >> > > >> > 4. I tried restarting ceph-osd@60 -- this time it came right up!!! > I was able to start all the rest, including ceph-osd@67 which I thought > did not get activated by lvm. > >> > > >> > 5. I checked from the admin node and verified osd.60-69 are all in > the cluster as Bluestore OSDs and they indeed are. > >> > > >> > ******************** > >> > Thank you SO MUCH, both of you, for putting up with my novice > questions all the way. I am planning to convert the rest of the cluster > the same way by reviewing this entire thread to trace what steps need to be > taken. > >> > > >> > Mami > >> > > >> > On Mon, Nov 5, 2018 at 3:00 PM, Hector Martin <hec...@marcansoft.com> > wrote: > >> >> > >> >> > >> >> > >> >> On 11/6/18 3:31 AM, Hayashida, Mami wrote: > >> >> > 2018-11-05 12:47:01.075573 7f1f2775ae00 -1 > bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block > device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission > denied > >> >> > >> >> Looks like the permissions on the block.db device are wrong. As far > as I > >> >> know ceph-volume is responsible for setting this at activation time. > >> >> > >> >> > I already ran the "ceph-volume lvm activate --all " command right > after > >> >> > I prepared (using "lvm prepare") those OSDs. Do I need to run the > >> >> > "activate" command again? > >> >> > >> >> The activation is required on every boot to create the > >> >> /var/lib/ceph/osd/* directory, but that should be automatically done > by > >> >> systemd units (since you didn't run it after the reboot and yet the > >> >> directories exist, it seems to have worked). > >> >> > >> >> Can you ls -l the OSD directory (/var/lib/ceph/osd/ceph-60/) and also > >> >> any devices symlinked to from there, to see the permissions? > >> >> > >> >> Then run the activate command again and list the permissions again to > >> >> see if they have changed, and if they have, try to start the OSD > again. > >> >> > >> >> I found one Ubuntu bug that suggests there may be a race condition: > >> >> > >> >> https://bugs.launchpad.net/bugs/1767087 > >> >> > >> >> I get the feeling the ceph-osd activation may be happening before the > >> >> block.db device is ready, so when it gets created by LVM it's already > >> >> too late and doesn't have the right permissions. You could fix it > with a > >> >> udev rule (like Ubuntu did) but if this is indeed your issue then it > >> >> sounds like something that should be fixed in Ceph. Perhaps all you > need > >> >> is a systemd unit override to make sure ceph-volume@* services only > >> >> start after LVM is ready. > >> >> > >> >> A usable udev rule could look like this (e.g. put it in > >> >> /etc/udev/rules.d/90-lvm-permisions.rules): > >> >> > >> >> ACTION=="change", SUBSYSTEM=="block", ENV{DEVTYPE}=="disk", \ > >> >> ENV{DM_LV_NAME}=="db*", ENV{DM_VG_NAME}=="ssd0", \ > >> >> OWNER="ceph", GROUP="ceph", MODE="660" > >> >> > >> >> Reboot after that and see if the OSDs come up without further action. > >> >> > >> >> -- > >> >> Hector Martin (hec...@marcansoft.com) > >> >> Public Key: https://mrcn.st/pub > >> > > >> > > >> > > >> > > >> > -- > >> > Mami Hayashida > >> > Research Computing Associate > >> > > >> > Research Computing Infrastructure > >> > University of Kentucky Information Technology Services > >> > 301 Rose Street | 102 James F. Hardymon Building > >> > Lexington, KY 40506-0495 > >> > mami.hayash...@uky.edu > >> > (859)323-7521 > > > > > > > > > > -- > > Mami Hayashida > > Research Computing Associate > > > > Research Computing Infrastructure > > University of Kentucky Information Technology Services > > 301 Rose Street | 102 James F. Hardymon Building > > Lexington, KY 40506-0495 > > mami.hayash...@uky.edu > > (859)323-7521 > -- *Mami Hayashida* *Research Computing Associate* Research Computing Infrastructure University of Kentucky Information Technology Services 301 Rose Street | 102 James F. Hardymon Building Lexington, KY 40506-0495 mami.hayash...@uky.edu (859)323-7521
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com