You are running into https://tracker.ceph.com/issues/24423
I've fixed it here: https://github.com/ceph/ceph/pull/22585

The fix has already been backported and will be in 13.2.1


Paul


2018-06-27 8:40 GMT+02:00 Steffen Winther Sørensen <ste...@gmail.com>:

> List,
>
> Had a failed disk behind an OSD in a Mimic Cluster 13.2.0, so I tried
> following the doc on removal of an OSD.
>
> I did:
>
> # ceph osd crush reweight osd.19 0
> waited for rebalancing to finish and cont.:
> # ceph osd out 19
> # systemctl stop ceph-osd@19
> # ceph osd purge 19 --yes-i-really-mean-it
>
> verified that osd.19 was out of map w/ ceph osd tree
>
> Still found this tmpfs mounted though to my surprise:
> tmpfs                    7.8G   48K  7.8G   1% /var/lib/ceph/osd/ceph-19
>
> Replaced the failed drive and then attempted:
>
> # ceph-volume lvm zap /dev/sdh
> # ceph-volume lvm create --osd-id 19 --data /dev/sdh
> Running command: /bin/ceph-authtool --gen-print-key
> Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd
> --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
> Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd
> --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
> 5352d594-aa19-4147-a884-ca
> 2c5775aa1b
> Running command: /usr/sbin/vgcreate --force --yes
> ceph-a2ebf47b-fa4a-43ce-b087-1
> 2dbafb5796e /dev/sdh
>  stderr: WARNING: Device for PV CdiFOZ-n89Z-G5EF-JBBV-GFfU-bDRV-VJQHho
> not found
>  or rejected by a filter.
>  stderr: WARNING: Device for PV CdiFOZ-n89Z-G5EF-JBBV-GFfU-bDRV-VJQHho
> not found
>  or rejected by a filter.
>  stderr: /dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-
> efae9323-b934-
> 408e-a4f9-1e1f62d88f2d: read failed after 0 of 4096 at 0: Input/output
> error
>   /dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-
> efae9323-b934-408e-a4
> f9-1e1f62d88f2d: read failed after 0 of 4096 at 146775408640: Input/output
> error
>   /dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-
> efae9323-b934-408e-a4
> f9-1e1f62d88f2d: read failed after 0 of 4096 at 146775465984: Input/output
> error
>  stderr: /dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-
> efae9323-b934-
> 408e-a4f9-1e1f62d88f2d: read failed after 0 of 4096 at 4096: Input/output
> error
>  stderr: WARNING: Device for PV CdiFOZ-n89Z-G5EF-JBBV-GFfU-bDRV-VJQHho
> not found
>  or rejected by a filter.
>  stdout: Physical volume "/dev/sdh" successfully created.
>  stdout: Volume group "ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e"
> successfully created
> Running command: /usr/sbin/lvcreate --yes -l 100%FREE -n
> osd-block-5352d594-aa19
> -4147-a884-ca2c5775aa1b ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e
>  stdout: Logical volume "osd-block-5352d594-aa19-4147-a884-ca2c5775aa1b"
> created
> .
> Running command: /bin/ceph-authtool --gen-print-key
> Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-19
> Running command: /bin/chown -R ceph:ceph /dev/dm-9
> Running command: /bin/ln -s /dev/ceph-a2ebf47b-fa4a-43ce-
> b087-12dbafb5796e/osd-b
> lock-5352d594-aa19-4147-a884-ca2c5775aa1b /var/lib/ceph/osd/ceph-19/block
> Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd
> --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
> /var/lib/ceph/osd/ceph-19/activate.monmap
>  stderr: got monmap epoch 1
> Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-19/keyring
> --create-k
> eyring --name osd.19 --add-key AQBY1TBbN8I+HxAAMHGWKLgJugmtzdqllQh5sA==
>  stdout: creating /var/lib/ceph/osd/ceph-19/keyring
>  stdout: added entity osd.19 auth auth(auid = 18446744073709551615
> key=AQBY1TBbN
> 8I+HxAAMHGWKLgJugmtzdqllQh5sA== with 0 caps)
> Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-19/keyring
> Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-19/
> Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore
> --mkfs
>  -i 19 --monmap /var/lib/ceph/osd/ceph-19/activate.monmap --keyfile -
> --osd-data
>  /var/lib/ceph/osd/ceph-19/ --osd-uuid 5352d594-aa19-4147-a884-ca2c5775aa1b
> --se
> tuser ceph --setgroup ceph
> --> ceph-volume lvm prepare successful for: /dev/sdh
> Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir
> --dev /de
> v/ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e/osd-block-
> 5352d594-aa19-4147-a884-ca
> 2c5775aa1b --path /var/lib/ceph/osd/ceph-19
> Running command: /bin/ln -snf /dev/ceph-a2ebf47b-fa4a-43ce-
> b087-12dbafb5796e/osd
> -block-5352d594-aa19-4147-a884-ca2c5775aa1b /var/lib/ceph/osd/ceph-19/
> block
> Running command: /bin/chown -R ceph:ceph /dev/dm-9
> Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-19
> Running command: /bin/systemctl enable ceph-volume@lvm-19-5352d594-
> aa19-4147-a88
> 4-ca2c5775aa1b
>  stderr: Created symlink from /etc/systemd/system/multi-
> user.target.wants/ceph-v
> olume@lvm-19-5352d594-aa19-4147-a884-ca2c5775aa1b.service to
> /usr/lib/systemd/sy
> stem/ceph-volume@.service.
> Running command: /bin/systemctl start ceph-osd@19
> --> ceph-volume lvm activate successful for osd ID: 19
> --> ceph-volume lvm create successful for: /dev/sdh
>
> verified that osd.19 was in the map with:
> # ceph osd tree
> ID CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF
> -1       3.20398 root default
> -9       0.80099     host n1
> 18   hdd 0.13350         osd.18     up  1.00000 1.00000
> 19   hdd 0.13350         osd.19   down        0 1.00000
> 20   hdd 0.13350         osd.20     up  1.00000 1.00000
> 21   hdd 0.13350         osd.21     up  1.00000 1.00000
> 22   hdd 0.13350         osd.22     up  1.00000 1.00000
> 23   hdd 0.13350         osd.23     up  1.00000 1.00000
>
> Only it fails to launch
> # systemctl start ceph-osd@19
> # systemctl status ceph-osd@19
> â ceph-osd@19.service - Ceph object storage daemon osd.19
>    Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; disabled;
> vendor preset: disabled)
>    Active: activating (auto-restart) (Result: signal) since Mon 2018-06-25
> 13:44:35 CEST; 3s ago
>   Process: 2046453 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER}
> --id %i --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
>   Process: 2046447 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh
> --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>  Main PID: 2046453 (code=killed, signal=ABRT)
>
> Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 8:
> (OSD::handle_osd_map(MOSDMap*)+0x1020) [0x56353eac71f0]
> Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 9:
> (OSD::_dispatch(Message*)+0xa1) [0x56353eac9d21]
> Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 10:
> (OSD::ms_dispatch(Message*)+0x56) [0x56353eaca066]
> Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 11:
> (DispatchQueue::entry()+0xb5a) [0x7f302acce74a]
> Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 12: 
> (DispatchQueue::DispatchThread::entry()+0xd)
> [0x7f302ad6ef2d]
> Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 13: (()+0x7e25)
> [0x7f30277b0e25]
> Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 14: (clone()+0x6d)
> [0x7f30268a1bad]
> Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: NOTE: a copy of the
> executable, or `objdump -rdS <executable>` is needed to interpret this.
> Jun 25 13:44:35 n1.sprawl.dk systemd[1]: Unit ceph-osd@19.service entered
> failed state.
> Jun 25 13:44:35 n1.sprawl.dk systemd[1]: ceph-osd@19.service failed.
>
> osd.19 log show:
>
> --- begin dump of recent events ---
>      0> 2018-06-25 13:48:47.139 7fc6b91c5700 -1 *** Caught signal
> (Aborted) **
>  in thread 7fc6b91c5700 thread_name:ms_dispatch
>
>  ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic
> (stable)
>  1: (()+0x8e1870) [0x55da2ff6e870]
>  2: (()+0xf6d0) [0x7fc6c97ba6d0]
>  3: (gsignal()+0x37) [0x7fc6c87db277]
>  4: (abort()+0x148) [0x7fc6c87dc968]
>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x25d) [0x7fc6ccc5a69d]
>  6: (()+0x286727) [0x7fc6ccc5a727]
>  7: (OSDService::get_map(unsigned int)+0x4a) [0x55da2faa3dda]
>  8: (OSD::handle_osd_map(MOSDMap*)+0x1020) [0x55da2fa511f0]
>  9: (OSD::_dispatch(Message*)+0xa1) [0x55da2fa53d21]
>  10: (OSD::ms_dispatch(Message*)+0x56) [0x55da2fa54066]
>  11: (DispatchQueue::entry()+0xb5a) [0x7fc6cccd074a]
>  12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fc6ccd70f2d]
>  13: (()+0x7e25) [0x7fc6c97b2e25]
>  14: (clone()+0x6d) [0x7fc6c88a3bad]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> Any hints would be appreciated, TIA!
>
> /Steffen
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to