> Steps 3-6 are to get the drive lvm volume back

How much longer will we have to deal with LVM?  If we can migrate non-LVM
drives from earlier versions, how about we give ceph-volume the ability to
create non-LVM OSDs directly?



On Thu, May 16, 2019 at 1:20 PM Tarek Zegar <tze...@us.ibm.com> wrote:

> FYI for anyone interested, below is how to recover from a someone removing
> a NVME drive (the first two steps show how mine were removed and brought
> back)
> Steps 3-6 are to get the drive lvm volume back AND get the OSD daemon
> running for the drive
>
> 1. echo 1 > /sys/block/nvme0n1/device/device/remove
> 2. echo 1 > /sys/bus/pci/rescan
> 3. vgcfgrestore ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841 ; vgchange -ay
> ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841
> 4. ceph auth add osd.122 osd 'allow *' mon 'allow rwx' -i
> /var/lib/ceph/osd/ceph-122/keyring
> 5. ceph-volume lvm activate --all
> 6. You should see the drive somewhere in the ceph tree, move it to the
> right host
>
> Tarek
>
>
>
> [image: Inactive hide details for "Tarek Zegar" ---05/15/2019 10:32:27
> AM---TLDR; I activated the drive successfully but the daemon won]"Tarek
> Zegar" ---05/15/2019 10:32:27 AM---TLDR; I activated the drive successfully
> but the daemon won't start, looks like it's complaining abo
>
> From: "Tarek Zegar" <tze...@us.ibm.com>
> To: Alfredo Deza <ad...@redhat.com>
> Cc: ceph-users <ceph-users@lists.ceph.com>
> Date: 05/15/2019 10:32 AM
> Subject: [EXTERNAL] Re: [ceph-users] Lost OSD from PCIe error, recovered,
> to restore OSD process
> Sent by: "ceph-users" <ceph-users-boun...@lists.ceph.com>
> ------------------------------
>
>
>
> TLDR; I activated the drive successfully but the daemon won't start, looks
> like it's complaining about mon config, idk why (there is a valid ceph.conf
> on the host). Thoughts? I feel like it's close. Thank you
>
> I executed the command:
> ceph-volume lvm activate --all
>
>
> It found the drive and activated it:
> --> Activating OSD ID 122 FSID a151bea5-d123-45d9-9b08-963a511c042a
> ....
> --> ceph-volume lvm activate successful for osd ID: 122
>
>
>
> However, systemd would not start the OSD process 122:
> May 15 14:16:13 pok1-qz1-sr1-rk001-s20 ceph-osd[757237]: 2019-05-15
> 14:16:13.862 7ffff1970700 -1 monclient(hunting): handle_auth_bad_method
> server allowed_methods [2] but i only support [2]
> May 15 14:16:13 pok1-qz1-sr1-rk001-s20 ceph-osd[757237]: 2019-05-15
> 14:16:13.862 7ffff116f700 -1 monclient(hunting): handle_auth_bad_method
> server allowed_methods [2] but i only support [2]
> May 15 14:16:13 pok1-qz1-sr1-rk001-s20 ceph-osd[757237]:* failed to fetch
> mon config (--no-mon-config to skip)*
> May 15 14:16:13 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service:
> Main process exited, code=exited, status=1/FAILURE
> May 15 14:16:13 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service: 
> *Failed
> with result 'exit-code'.*
> May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service:
> Service hold-off time over, scheduling restart.
> May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service:
> Scheduled restart job, restart counter is at 3.
> -- Subject: Automatic restarting of a unit has been scheduled
> -- Defined-By: systemd
> -- Support: *http://www.ubuntu.com/support*
> <http://www.ubuntu.com/support>
> --
> -- Automatic restarting of the unit ceph-osd@122.service has been
> scheduled, as the result for
> -- the configured Restart= setting for the unit.
> May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: Stopped Ceph object
> storage daemon osd.122.
> -- Subject: Unit ceph-osd@122.service has finished shutting down
> -- Defined-By: systemd
> -- Support: *http://www.ubuntu.com/support*
> <http://www.ubuntu.com/support>
> --
> -- Unit ceph-osd@122.service has finished shutting down.
> May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service:
> Start request repeated too quickly.
> May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: ceph-osd@122.service:
> Failed with result 'exit-code'.
> May 15 14:16:14 pok1-qz1-sr1-rk001-s20 systemd[1]: *Failed to start Ceph
> object storage daemon osd.122*
>
>
>
> [image: Inactive hide details for Alfredo Deza ---05/15/2019 08:27:13
> AM---On Tue, May 14, 2019 at 7:24 PM Bob R <b...@drinksbeer.org>]Alfredo
> Deza ---05/15/2019 08:27:13 AM---On Tue, May 14, 2019 at 7:24 PM Bob R <
> b...@drinksbeer.org> wrote: >
>
> From: Alfredo Deza <ad...@redhat.com>
> To: Bob R <b...@drinksbeer.org>
> Cc: Tarek Zegar <tze...@us.ibm.com>, ceph-users <ceph-users@lists.ceph.com
> >
> Date: 05/15/2019 08:27 AM
> Subject: [EXTERNAL] Re: [ceph-users] Lost OSD from PCIe error, recovered,
> to restore OSD process
> ------------------------------
>
>
>
> On Tue, May 14, 2019 at 7:24 PM Bob R <b...@drinksbeer.org> wrote:
> >
> > Does 'ceph-volume lvm list' show it? If so you can try to activate it
> with 'ceph-volume lvm activate 122 74b01ec2--124d--427d--9812--e437f90261d4'
>
> Good suggestion. If `ceph-volume lvm list` can see it, it can probably
> activate it again. You can activate it with the OSD ID + OSD FSID, or
> do:
>
> ceph-volume lvm activate --all
>
> You didn't say if the OSD wasn't coming up after trying to start it
> (the systemd unit should still be there for ID 122), or if you tried
> rebooting and that OSD didn't come up.
>
> The systemd unit is tied to both the ID and FSID of the OSD, so it
> shouldn't matter if the underlying device changed since ceph-volume
> ensures it is the right one every time it activates.
> >
> > Bob
> >
> > On Tue, May 14, 2019 at 7:35 AM Tarek Zegar <tze...@us.ibm.com> wrote:
> >>
> >> Someone nuked and OSD that had 1 replica PGs. They accidentally did
> echo 1 > /sys/block/nvme0n1/device/device/remove
> >> We got it back doing a echo 1 > /sys/bus/pci/rescan
> >> However, it reenumerated as a different drive number (guess we didn't
> have udev rules)
> >> They restored the LVM volume (vgcfgrestore
> ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841 ; vgchange -ay
> ceph-8c81b2a3-6c8e-4cae-a3c0-e2d91f82d841)
> >>
> >> lsblk
> >> nvme0n2 259:9 0 1.8T 0 diskc
> >>
> ceph--8c81b2a3--6c8e--4cae--a3c0--e2d91f82d841-osd--data--74b01ec2--124d--427d--9812--e437f90261d4
> 253:1 0 1.8T 0 lvm
> >>
> >> We are stuck here. How do we attach an OSD daemon to the drive? It was
> OSD.122 previously
> >>
> >> Thanks
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> *http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com*
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > *http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com*
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to