I take my hat off to you, well done for solving that!!!
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Zdenek Janda
> Sent: 11 January 2018 13:01
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Cluster cra
Hi,
we have restored damaged ODS not starting after bug caused by this
issue, detailed steps are for reference at
http://tracker.ceph.com/issues/21142#note-9 , should anybody hit into
this this should fix it for you.
Thanks
Zdenek Janda
On 11.1.2018 11:40, Zdenek Janda wrote:
> Hi,
> I have suc
Hi,
I have succeeded in identifying faulty PG:
-3450> 2018-01-11 11:32:20.015658 7f066e2a3e00 10 osd.15 15340 12.62d
needs 13939-15333
-3449> 2018-01-11 11:32:20.019405 7f066e2a3e00 1 osd.15 15340
build_past_intervals_parallel over 13939-15333
-3448> 2018-01-11 11:32:20.019436 7f066e2a3e00 10
Hi,
updated the issue at http://tracker.ceph.com/issues/21142#note-5 with
last 1 lines of strace before ABRT. Crash ends with:
0.002429 pread64(22,
"\10\7\213,\0\0\6\1i\33\0\0c\341\353kW\rC\365\2310\34\307\212\270\215{\354:\0\0"...,
12288, 908492996608) = 12288
0.007869 pread64(22,
Hi,
does anyone suggest what to do with this ? I have identified the
underlying crashing code src/osd/osd_types.cc [assert(interval.last >
last);] commited by Sage Weil, however didnt figured out exact mechanism
of function and why it crashes. Also unclear is mechanism, how this bug
spreaded and cr
I have posted logs/strace from our osds with details to a ticket in the
ceph bug tracker - see here http://tracker.ceph.com/issues/21142. You
can see where exactly the OSDs crash etc, this can be of help if someone
decides to debug it.
JZ
On 10/01/18 22:05, Josef Zelenka wrote:
Hi, today w
Hi, today we had a disasterous crash - we are running a 3 node, 24 osd
in total cluster (8 each) with SSDs for blockdb, HDD for bluestore data.
This cluster is used as a radosgw backend, for storing a big number of
thumbnails for a file hosting site - around 110m files in total. We were
adding