> Steps 3-6 are to get the drive lvm volume back How much longer will we have to deal with LVM? If we can migrate non-LVM drives from earlier versions, how about we give ceph-volume the ability to create non-LVM OSDs directly?
On Thu, May 16, 2019 at 1:20 PM Tarek Zegar <tze...@us.ibm.com> wrote: > FYI for anyone interested, below is how to recover from a someone removing > a NVME drive (the first two steps show how mine were removed and brought > back) > Steps 3-6 are to get the drive lvm volume back AND get the OSD daemon > running for the drive > > 1. echo 1 > /sys/block/nvme0n1/device/device/remove > 2. echo 1 > /sys/bus/pci/rescan > 3. vgcfgrestore ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841 ; vgchange -ay > ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841 > 4. ceph auth add osd.122 osd 'allow *' mon 'allow rwx' -i > /var/lib/ceph/osd/ceph-122/keyring > 5. ceph-volume lvm activate --all > 6. You should see the drive somewhere in the ceph tree, move it to the > right host > > Tarek > > > > [image: Inactive hide details for "Tarek Zegar" ---05/15/2019 10:32:27 > AM---TLDR; I activated the drive successfully but the daemon won]"Tarek > Zegar" ---05/15/2019 10:32:27 AM---TLDR; I activated the drive successfully > but the daemon won't start, looks like it's complaining abo > > From: "Tarek Zegar" <tze...@us.ibm.com> > To: Alfredo Deza <ad...@redhat.com> > Cc: ceph-users <ceph-users@lists.ceph.com> > Date: 05/15/2019 10:32 AM > Subject: [EXTERNAL] Re: [ceph-users] Lost OSD from PCIe error, recovered, > to restore OSD process > Sent by: "ceph-users" <ceph-users-boun...@lists.ceph.com> > ------------------------------ > > > > TLDR; I activated the drive successfully but the daemon won't start, looks > like it's complaining about mon config, idk why (there is a valid ceph.conf > on the host). Thoughts? I feel like it's close. Thank you > > I executed the command: > ceph-volume lvm activate --all > > > It found the drive and activated it: > --> Activating OSD ID 122 FSID a151bea5-d123-45d9-9b08-963a511c042a > .... > --> ceph-volume lvm activate successful for osd ID: 122 > > > > However, systemd would not start the OSD process 122: > May 15 14:16:13 pok1-qz1-sr1-rk001-s20 ceph-osd[757237]: 2019-05-15 > 14:16:13.862 7ffff1970700 -1 monclient(hunting): handle_auth_bad_method > server allowed_methods [2] but i only support [2] > May 15 14:16:13 pok1-qz1-sr1-rk001-s20 ceph-osd[757237]: 2019-05-15 > 14:16:13.862 7ffff116f700 -1 monclient(hunting): handle_auth_bad_method > server allowed_methods [2] but i only support [2] > May 15 14:16:13 pok1-qz1-sr1-rk001-s20 ceph-osd[757237]:* failed to fetch > mon config (--no-mon-config to skip)* > May 15 14:16:13 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: > Main process exited, code=exited, status=1/FAILURE > May 15 14:16:13 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: > *Failed > with result 'exit-code'.* > May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: > Service hold-off time over, scheduling restart. > May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: > Scheduled restart job, restart counter is at 3. > -- Subject: Automatic restarting of a unit has been scheduled > -- Defined-By: systemd > -- Support: *http://www.ubuntu.com/support* > <http://www.ubuntu.com/support> > -- > -- Automatic restarting of the unit ceph-osd@122.service has been > scheduled, as the result for > -- the configured Restart= setting for the unit. > May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: Stopped Ceph object > storage daemon osd.122. > -- Subject: Unit ceph-osd@122.service has finished shutting down > -- Defined-By: systemd > -- Support: *http://www.ubuntu.com/support* > <http://www.ubuntu.com/support> > -- > -- Unit ceph-osd@122.service has finished shutting down. > May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: > Start request repeated too quickly. > May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: > Failed with result 'exit-code'. > May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: *Failed to start Ceph > object storage daemon osd.122* > > > > [image: Inactive hide details for Alfredo Deza ---05/15/2019 08:27:13 > AM---On Tue, May 14, 2019 at 7:24 PM Bob R <b...@drinksbeer.org>]Alfredo > Deza ---05/15/2019 08:27:13 AM---On Tue, May 14, 2019 at 7:24 PM Bob R < > b...@drinksbeer.org> wrote: > > > From: Alfredo Deza <ad...@redhat.com> > To: Bob R <b...@drinksbeer.org> > Cc: Tarek Zegar <tze...@us.ibm.com>, ceph-users <ceph-users@lists.ceph.com > > > Date: 05/15/2019 08:27 AM > Subject: [EXTERNAL] Re: [ceph-users] Lost OSD from PCIe error, recovered, > to restore OSD process > ------------------------------ > > > > On Tue, May 14, 2019 at 7:24 PM Bob R <b...@drinksbeer.org> wrote: > > > > Does 'ceph-volume lvm list' show it? If so you can try to activate it > with 'ceph-volume lvm activate 122 74b01ec2--124d--427d--9812--e437f90261d4' > > Good suggestion. If `ceph-volume lvm list` can see it, it can probably > activate it again. You can activate it with the OSD ID + OSD FSID, or > do: > > ceph-volume lvm activate --all > > You didn't say if the OSD wasn't coming up after trying to start it > (the systemd unit should still be there for ID 122), or if you tried > rebooting and that OSD didn't come up. > > The systemd unit is tied to both the ID and FSID of the OSD, so it > shouldn't matter if the underlying device changed since ceph-volume > ensures it is the right one every time it activates. > > > > Bob > > > > On Tue, May 14, 2019 at 7:35 AM Tarek Zegar <tze...@us.ibm.com> wrote: > >> > >> Someone nuked and OSD that had 1 replica PGs. They accidentally did > echo 1 > /sys/block/nvme0n1/device/device/remove > >> We got it back doing a echo 1 > /sys/bus/pci/rescan > >> However, it reenumerated as a different drive number (guess we didn't > have udev rules) > >> They restored the LVM volume (vgcfgrestore > ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841 ; vgchange -ay > ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841) > >> > >> lsblk > >> nvme0n2 259:9 0 1.8T 0 diskc > >> > ceph--8c81b2a3--6c8e--4cae--a3c0--e2d91f82d841-osd--data--74b01ec2--124d--427d--9812--e437f90261d4 > 253:1 0 1.8T 0 lvm > >> > >> We are stuck here. How do we attach an OSD daemon to the drive? It was > OSD.122 previously > >> > >> Thanks > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> *http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com* > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > *http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com* > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com