[ceph-users] Re: ceph octopus mysterious OSD crash

胡玮文 Thu, 18 Mar 2021 17:56:54 -0700

“podman logs ceph-xxxxxxx-osd-xxx” may contains additional logs.


> 在 2021年3月19日，04:29，Philip Brown <[email protected]> 写道：
> 
> I've been banging on my ceph octopus test cluster for a few days now.
> 8 nodes. each node has 2 SSDs and 8 HDDs. 
> They were all autoprovisioned so that each HDD gets an LVM slice of an SSD as 
> a db partition.
> 
> service_type: osd
> service_id: osd_spec_default
> placement:
>  host_pattern: '*'
> data_devices:
>  rotational: 1
> db_devices:
>  rotational: 0
> 
> 
> things were going pretty good, until... yesterday.. i noticed TWO of the OSDs 
> were "down".
> 
> I went to check the logs, with 
> journalctl -u [email protected]
> 
> all it showed were a bunch of generic debug info, and the fact that it 
> stopped.
> and various automatic attempts to restart.
> but no indication of what was wrong, and why the restarts KEEP failing.
> 
> 
> sample output:
> 
> 
> systemd[1]: Stopped Ceph osd.33 for e51eb2fa-7f82-11eb-94d5-78e3b5148f00.
> systemd[1]: Starting Ceph osd.33 for e51eb2fa-7f82-11eb-94d5-78e3b5148f00...
> bash[9340]: ceph-e51eb2fa-7f82-11eb-94d5-78e3b5148f00-osd.33-activate
> bash[9340]: WARNING: The same type, major and minor should not be used for 
> multiple devices.
> bash[9340]: WARNING: The same type, major and minor should not be used for 
> multiple devices.
> podman[9369]: 2021-03-07 16:00:15.543010794 -0800 PST m=+0.318475882 
> container create
> podman[9369]: 2021-03-07 16:00:15.73461926 -0800 PST m=+0.510084288 container 
> init
> .....
> bash[1611473]: --> ceph-volume lvm activate successful for osd ID: 33
> podman[1611501]: 2021-03-18 10:23:02.564242824 -0700 PDT m=+1.379793448 
> container died 
> bash[1611473]: ceph-xx-xx-xx-xx-osd.33
> bash[1611473]: WARNING: The same type, major and minor should not be used for 
> multiple devices.
> (repeat, repeat...)
> podman[1611615]: 2021-03-18 10:23:03.530992487 -0700 PDT m=+0.333130660 
> container create
> 
> ....
> systemd[1]: Started Ceph osd.33 for xx-xx-xx-xx
> systemd[1]: [email protected]: main process exited, 
> code=exited, status=1/FAILURE
> bash[1611797]: ceph-xx-xx-xx-xx-osd.33-deactivate
> 
> and eventually it just gives up.
> 
> smartctl -a doesnt show any errors on the HDD
> 
> 
> dmesg doesnt show anything.
> 
> So... what do I do?
> 
> 
> 
> 
> 
> --
> Philip Brown| Sr. Linux System Administrator | Medata, Inc. 
> 5 Peters Canyon Rd Suite 250 
> Irvine CA 92606 
> Office 714.918.1310| Fax 714.918.1325 
> [email protected]| 
> https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.medata.com%2F&amp;data=04%7C01%7C%7C739f028cfcc04020c94c08d8ea4c9673%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637516961950804014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=TRkxSSU8BhLWM7cNpyJ8lX6J7U6Fdfi7ubrkFt91DkU%3D&amp;reserved=0
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: ceph octopus mysterious OSD crash

Reply via email to