Hi Folks,

I have found similar reports of this problem in the past but can't seem to find 
any solution to it.
We have ceph filesystem running mimic version 13.2.5.
OSDs are running on AWS EC2 instances with centos 7. OSD disk is an AWS nvme 
device.

Problem I,  sometimes when rebooting an OSD instance, the OSD volume fails to 
mount and the OSD cannot start.

ceph-volume.log repeats the following
[2019-08-28 09:10:42,061][ceph_volume.main][INFO  ] Running command: 
ceph-volume  lvm trigger 0-fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:42,063][ceph_volume.process][INFO  ] Running command: 
/usr/sbin/lvs --noheadings --readonly --separator=";" -o 
lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2019-08-28 09:10:42,074][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, 
in newfunc
   return f(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 148, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in 
dispatch
    instance.main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line 
40, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in 
dispatch
    instance.main()
 File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in 
is_root
    return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/trigger.py", 
line 70, in main
    Activate(['--auto-detect-objectstore', osd_id, osd_uuid]).main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/activate.py", 
line 339, in main
    self.activate(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, 
in is_root
    return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/activate.py", 
line 249, in activate
    raise RuntimeError('could not find osd.%s with fsid %s' % (osd_id, 
osd_fsid))
RuntimeError: could not find osd.0 with fsid 
fcaffe93-4c03-403c-9702-7f1ec694a578

ceph-volume-systemd.log repeats
[2019-08-28 09:10:41,877][systemd][INFO  ] raw systemd input received: 
lvm-0-fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:41,877][systemd][INFO  ] parsed sub-command: lvm, extra data: 
0-fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:41,926][ceph_volume.process][INFO  ] Running command: 
/usr/sbin/ceph-volume lvm trigger 0-fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:42,077][ceph_volume.process][INFO  ] stderr -->  
RuntimeError: could not find osd.0 with fsid 
fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:42,084][systemd][WARNING] command returned non-zero exit 
status: 1
[2019-08-28 09:10:42,084][systemd][WARNING] failed activating OSD, retries 
left: 30

To recover I destroy the OSD, zap the disk and create it again.
# ceph osd destroy 0 --yes-i-really-mean-it
# ceph-volume lvm zap /dev/nvme1n1 -destroy
# ceph-volume lvm create --osd-id 0 --data /dev/nvme1n1
# systemctl start ceph-osd@0

Is there something I need to do so that the OSD can boot without these problems?

Thank you!
Tom

Attachment: ceph-volume.log
Description: ceph-volume.log

Attachment: ceph-volume-systemd.log
Description: ceph-volume-systemd.log

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to