Re: [ceph-users] Ceph Luminous - OSD constantly crashing caused by corrupted placement group

Gregory Farnum Tue, 15 May 2018 15:21:07 -0700

Looks like something went a little wrong with the snapshot metadata in that
PG. If the PG is still going active from the other copies, you're probably
best off using the ceph-objectstore-tool to remove it on the OSD that is
crashing. You could either replace it with an export from one of the other
nodes, or let Ceph do the backfilling on its own.
-Greg


On Tue, May 15, 2018 at 2:13 AM Siegfried Höllrigl <
siegfried.hoellr...@xidras.com> wrote:

>
>
> Hi !
>
> We have upgraded our Ceph cluster (3 Mon Servers, 9 OSD Servers, 190
> OSDs total) From 10.2.10 to Ceph 12.2.4 and then to 12.2.5.
> (A mixture of Ubuntu 14 and 16 with the Repos from
> https://download.ceph.com/debian-luminous/)
>
> Now we have the Problem that One ODS is crashing again and again
> (approx. once per day). systemd restarts it.
>
> We could now propably identify the problem. It looks like one placement
> group (5.9b) causes the crash.
> It seems like it doesnt matter if it is running on a filestore or a
> bluestore osd.
> We could even break it down to some RBDs that were in this pool.
> They are already deleted, but it looks like there are some objects on
> the osd left, but we cant delete them :
>
>
> rados -p rbd ls > radosrbdls.txt
> echo radosrbdls.txt | grep -vE "($(rados -p rbd ls | grep rbd_header |
> grep -o "\.[0-9a-f]*" | sed -e :a -e '$!N; s/\n/|/; ta' -e
> 's/\./\\./g'))" | grep -E '(rbd_data|journal|rbd_object_map)'
> rbd_data.112913b238e1f29.0000000000000e3f
> rbd_data.112913b238e1f29.00000000000009d2
> rbd_data.112913b238e1f29.0000000000000ba3
>
> rados -p rbd rm rbd_data.112913b238e1f29.0000000000000e3f
> error removing rbd>rbd_data.112913b238e1f29.0000000000000e3f: (2) No
> such file or directory
> rados -p rbd rm rbd_data.112913b238e1f29.00000000000009d2
> error removing rbd>rbd_data.112913b238e1f29.00000000000009d2: (2) No
> such file or directory
> rados -p rbd rm rbd_data.112913b238e1f29.0000000000000ba3
> error removing rbd>rbd_data.112913b238e1f29.0000000000000ba3: (2) No
> such file or directory
>
> In the "current" directory of the osd there are a lot more files with
> this rbd prefix.
> Is there any chance to delete these obviously orpahed stuff before the
> pg becomes healthy ?
> (it is running now at only 2 of 3 osds)
>
> What else could cause such a crash ?
>
>
> We attatch (hopefully all) of the relevant logs.
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Luminous - OSD constantly crashing caused by corrupted placement group

Reply via email to