date:20200220

[ceph-users] osdmap::decode crc error -- 13.2.7 -- most osds down

2020-02-20 Thread Dan van der Ster

Hi,

My turn.
We suddenly have a big outage which is similar/identical to
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036519.html

Some of the osds are runnable, but most crash when they start -- crc
error in osdmap::decode.
I'm able to extract an osd map from a good osd and it decodes well
with osdmaptool:

# ceph-objectstore-tool --op get-osdmap --data-path
/var/lib/ceph/osd/ceph-680/ --file osd.680.map

But when I try on one of the bad osds I get:

# ceph-objectstore-tool --op get-osdmap --data-path
/var/lib/ceph/osd/ceph-666/ --file osd.666.map
terminate called after throwing an instance of 'ceph::buffer::malformed_input'
  what():  buffer::malformed_input: bad crc, actual 822724616 !=
expected 2334082500
*** Caught signal (Aborted) **
 in thread 7f600aa42d00 thread_name:ceph-objectstor
 ceph version 13.2.7 (71bd687b6e8b9424dd5e5974ed542595d8977416) mimic (stable)
 1: (()+0xf5f0) [0x7f5ffefc45f0]
 2: (gsignal()+0x37) [0x7f5ffdbae337]
 3: (abort()+0x148) [0x7f5ffdbafa28]
 4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f5ffe4be7d5]
 5: (()+0x5e746) [0x7f5ffe4bc746]
 6: (()+0x5e773) [0x7f5ffe4bc773]
 7: (()+0x5e993) [0x7f5ffe4bc993]
 8: (OSDMap::decode(ceph::buffer::list::iterator&)+0x160e) [0x7f6000f4168e]
 9: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f6000f42e31]
 10: (get_osdmap(ObjectStore*, unsigned int, OSDMap&,
ceph::buffer::list&)+0x1d0) [0x55d30a489190]
 11: (main()+0x5340) [0x55d30a3aae70]
 12: (__libc_start_main()+0xf5) [0x7f5ffdb9a505]
 13: (()+0x3a0f40) [0x55d30a483f40]
Aborted (core dumped)



I think I want to inject the osdmap, but can't:

# ceph-objectstore-tool --op set-osdmap --data-path
/var/lib/ceph/osd/ceph-666/ --file osd.680.map
osdmap (#-1:b65b78ab:::osdmap.2983572:0#) does not exist.


How do I do this?

Thanks for any help!

dan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: osdmap::decode crc error -- 13.2.7 -- most osds down

2020-02-20 Thread Wido den Hollander




On 2/20/20 12:40 PM, Dan van der Ster wrote:
> Hi,
> 
> My turn.
> We suddenly have a big outage which is similar/identical to
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036519.html
> 
> Some of the osds are runnable, but most crash when they start -- crc
> error in osdmap::decode.
> I'm able to extract an osd map from a good osd and it decodes well
> with osdmaptool:
> 
> # ceph-objectstore-tool --op get-osdmap --data-path
> /var/lib/ceph/osd/ceph-680/ --file osd.680.map
> 
> But when I try on one of the bad osds I get:
> 
> # ceph-objectstore-tool --op get-osdmap --data-path
> /var/lib/ceph/osd/ceph-666/ --file osd.666.map
> terminate called after throwing an instance of 'ceph::buffer::malformed_input'
>   what():  buffer::malformed_input: bad crc, actual 822724616 !=
> expected 2334082500
> *** Caught signal (Aborted) **
>  in thread 7f600aa42d00 thread_name:ceph-objectstor
>  ceph version 13.2.7 (71bd687b6e8b9424dd5e5974ed542595d8977416) mimic (stable)
>  1: (()+0xf5f0) [0x7f5ffefc45f0]
>  2: (gsignal()+0x37) [0x7f5ffdbae337]
>  3: (abort()+0x148) [0x7f5ffdbafa28]
>  4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f5ffe4be7d5]
>  5: (()+0x5e746) [0x7f5ffe4bc746]
>  6: (()+0x5e773) [0x7f5ffe4bc773]
>  7: (()+0x5e993) [0x7f5ffe4bc993]
>  8: (OSDMap::decode(ceph::buffer::list::iterator&)+0x160e) [0x7f6000f4168e]
>  9: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f6000f42e31]
>  10: (get_osdmap(ObjectStore*, unsigned int, OSDMap&,
> ceph::buffer::list&)+0x1d0) [0x55d30a489190]
>  11: (main()+0x5340) [0x55d30a3aae70]
>  12: (__libc_start_main()+0xf5) [0x7f5ffdb9a505]
>  13: (()+0x3a0f40) [0x55d30a483f40]
> Aborted (core dumped)
> 
> 
> 
> I think I want to inject the osdmap, but can't:
> 
> # ceph-objectstore-tool --op set-osdmap --data-path
> /var/lib/ceph/osd/ceph-666/ --file osd.680.map
> osdmap (#-1:b65b78ab:::osdmap.2983572:0#) does not exist.
> 

Have you tried to list which epoch osd.680 is at and which one osd.666
is at? And which one the MONs are at?

Maybe there is a difference there?

Wido

> 
> How do I do this?
> 
> Thanks for any help!
> 
> dan
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: osdmap::decode crc error -- 13.2.7 -- most osds down

2020-02-20 Thread Dan van der Ster

680 is epoch 2983572
666 crashes at 2982809 or 2982808

  -407> 2020-02-20 11:20:24.960 7f4d931b5b80 10 osd.666 0 add_map_bl
2982809 612069 bytes
  -407> 2020-02-20 11:20:24.966 7f4d931b5b80 -1 *** Caught signal (Aborted) **
 in thread 7f4d931b5b80 thread_name:ceph-osd

So I grabbed 2982809 and 2982808 and am setting them.

Checking if the osds will start with that.

-- dan



On Thu, Feb 20, 2020 at 12:47 PM Wido den Hollander  wrote:
> On 2/20/20 12:40 PM, Dan van der Ster wrote:
> > Hi,
> >
> > My turn.
> > We suddenly have a big outage which is similar/identical to
> > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036519.html
> >
> > Some of the osds are runnable, but most crash when they start -- crc
> > error in osdmap::decode.
> > I'm able to extract an osd map from a good osd and it decodes well
> > with osdmaptool:
> >
> > # ceph-objectstore-tool --op get-osdmap --data-path
> > /var/lib/ceph/osd/ceph-680/ --file osd.680.map
> >
> > But when I try on one of the bad osds I get:
> >
> > # ceph-objectstore-tool --op get-osdmap --data-path
> > /var/lib/ceph/osd/ceph-666/ --file osd.666.map
> > terminate called after throwing an instance of 
> > 'ceph::buffer::malformed_input'
> >   what():  buffer::malformed_input: bad crc, actual 822724616 !=
> > expected 2334082500
> > *** Caught signal (Aborted) **
> >  in thread 7f600aa42d00 thread_name:ceph-objectstor
> >  ceph version 13.2.7 (71bd687b6e8b9424dd5e5974ed542595d8977416) mimic 
> > (stable)
> >  1: (()+0xf5f0) [0x7f5ffefc45f0]
> >  2: (gsignal()+0x37) [0x7f5ffdbae337]
> >  3: (abort()+0x148) [0x7f5ffdbafa28]
> >  4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f5ffe4be7d5]
> >  5: (()+0x5e746) [0x7f5ffe4bc746]
> >  6: (()+0x5e773) [0x7f5ffe4bc773]
> >  7: (()+0x5e993) [0x7f5ffe4bc993]
> >  8: (OSDMap::decode(ceph::buffer::list::iterator&)+0x160e) [0x7f6000f4168e]
> >  9: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f6000f42e31]
> >  10: (get_osdmap(ObjectStore*, unsigned int, OSDMap&,
> > ceph::buffer::list&)+0x1d0) [0x55d30a489190]
> >  11: (main()+0x5340) [0x55d30a3aae70]
> >  12: (__libc_start_main()+0xf5) [0x7f5ffdb9a505]
> >  13: (()+0x3a0f40) [0x55d30a483f40]
> > Aborted (core dumped)
> >
> >
> >
> > I think I want to inject the osdmap, but can't:
> >
> > # ceph-objectstore-tool --op set-osdmap --data-path
> > /var/lib/ceph/osd/ceph-666/ --file osd.680.map
> > osdmap (#-1:b65b78ab:::osdmap.2983572:0#) does not exist.
> >
>
> Have you tried to list which epoch osd.680 is at and which one osd.666
> is at? And which one the MONs are at?
>
> Maybe there is a difference there?
>
> Wido
>
> >
> > How do I do this?
> >
> > Thanks for any help!
> >
> > dan
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: osdmap::decode crc error -- 13.2.7 -- most osds down

2020-02-20 Thread Dan van der Ster

For those following along, the issue is here:
https://tracker.ceph.com/issues/39525#note-6

Somehow single bits are getting flipped in the osdmaps -- maybe
network, maybe memory, maybe a bug.

To get an osd starting, we have to extract the full osdmap from the
mon, and set it into the crashing osd. So for the osd.666:

# ceph osd getmap 2982809 -o 2982809
# ceph-objectstore-tool --op set-osdmap --data-path
/var/lib/ceph/osd/ceph-666/ --file 2982809

Some osds had multiple corrupted osdmaps -- so we scriptified the above.

As of now our PGs are all active, but we're not confident that this
won't happen again (without knowing why the maps were corrupting).

Thanks to all who helped!

dan



On Thu, Feb 20, 2020 at 1:01 PM Dan van der Ster  wrote:
>
> 680 is epoch 2983572
> 666 crashes at 2982809 or 2982808
>
>   -407> 2020-02-20 11:20:24.960 7f4d931b5b80 10 osd.666 0 add_map_bl
> 2982809 612069 bytes
>   -407> 2020-02-20 11:20:24.966 7f4d931b5b80 -1 *** Caught signal (Aborted) **
>  in thread 7f4d931b5b80 thread_name:ceph-osd
>
> So I grabbed 2982809 and 2982808 and am setting them.
>
> Checking if the osds will start with that.
>
> -- dan
>
>
>
> On Thu, Feb 20, 2020 at 12:47 PM Wido den Hollander  wrote:
> > On 2/20/20 12:40 PM, Dan van der Ster wrote:
> > > Hi,
> > >
> > > My turn.
> > > We suddenly have a big outage which is similar/identical to
> > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036519.html
> > >
> > > Some of the osds are runnable, but most crash when they start -- crc
> > > error in osdmap::decode.
> > > I'm able to extract an osd map from a good osd and it decodes well
> > > with osdmaptool:
> > >
> > > # ceph-objectstore-tool --op get-osdmap --data-path
> > > /var/lib/ceph/osd/ceph-680/ --file osd.680.map
> > >
> > > But when I try on one of the bad osds I get:
> > >
> > > # ceph-objectstore-tool --op get-osdmap --data-path
> > > /var/lib/ceph/osd/ceph-666/ --file osd.666.map
> > > terminate called after throwing an instance of 
> > > 'ceph::buffer::malformed_input'
> > >   what():  buffer::malformed_input: bad crc, actual 822724616 !=
> > > expected 2334082500
> > > *** Caught signal (Aborted) **
> > >  in thread 7f600aa42d00 thread_name:ceph-objectstor
> > >  ceph version 13.2.7 (71bd687b6e8b9424dd5e5974ed542595d8977416) mimic 
> > > (stable)
> > >  1: (()+0xf5f0) [0x7f5ffefc45f0]
> > >  2: (gsignal()+0x37) [0x7f5ffdbae337]
> > >  3: (abort()+0x148) [0x7f5ffdbafa28]
> > >  4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f5ffe4be7d5]
> > >  5: (()+0x5e746) [0x7f5ffe4bc746]
> > >  6: (()+0x5e773) [0x7f5ffe4bc773]
> > >  7: (()+0x5e993) [0x7f5ffe4bc993]
> > >  8: (OSDMap::decode(ceph::buffer::list::iterator&)+0x160e) 
> > > [0x7f6000f4168e]
> > >  9: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f6000f42e31]
> > >  10: (get_osdmap(ObjectStore*, unsigned int, OSDMap&,
> > > ceph::buffer::list&)+0x1d0) [0x55d30a489190]
> > >  11: (main()+0x5340) [0x55d30a3aae70]
> > >  12: (__libc_start_main()+0xf5) [0x7f5ffdb9a505]
> > >  13: (()+0x3a0f40) [0x55d30a483f40]
> > > Aborted (core dumped)
> > >
> > >
> > >
> > > I think I want to inject the osdmap, but can't:
> > >
> > > # ceph-objectstore-tool --op set-osdmap --data-path
> > > /var/lib/ceph/osd/ceph-666/ --file osd.680.map
> > > osdmap (#-1:b65b78ab:::osdmap.2983572:0#) does not exist.
> > >
> >
> > Have you tried to list which epoch osd.680 is at and which one osd.666
> > is at? And which one the MONs are at?
> >
> > Maybe there is a difference there?
> >
> > Wido
> >
> > >
> > > How do I do this?
> > >
> > > Thanks for any help!
> > >
> > > dan
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: osdmap::decode crc error -- 13.2.7 -- most osds down

2020-02-20 Thread Wido den Hollander



> Op 20 feb. 2020 om 19:54 heeft Dan van der Ster  het 
> volgende geschreven:
> 
> For those following along, the issue is here:
> https://tracker.ceph.com/issues/39525#note-6
> 
> Somehow single bits are getting flipped in the osdmaps -- maybe
> network, maybe memory, maybe a bug.
> 

Weird!

But I did see things like this happen before. This was under Hammer and Jewel 
where I needed to these kind of things. Crashes looked very similar.

> To get an osd starting, we have to extract the full osdmap from the
> mon, and set it into the crashing osd. So for the osd.666:
> 
> # ceph osd getmap 2982809 -o 2982809
> # ceph-objectstore-tool --op set-osdmap --data-path
> /var/lib/ceph/osd/ceph-666/ --file 2982809
> 
> Some osds had multiple corrupted osdmaps -- so we scriptified the above.

Were those corrupted onces in sequence?

> As of now our PGs are all active, but we're not confident that this


Awesome!

Wido

> won't happen again (without knowing why the maps were corrupting).
> 
> Thanks to all who helped!
> 
> dan
> 
> 
> 
>> On Thu, Feb 20, 2020 at 1:01 PM Dan van der Ster  wrote:
>> 
>> 680 is epoch 2983572
>> 666 crashes at 2982809 or 2982808
>> 
>>  -407> 2020-02-20 11:20:24.960 7f4d931b5b80 10 osd.666 0 add_map_bl
>> 2982809 612069 bytes
>>  -407> 2020-02-20 11:20:24.966 7f4d931b5b80 -1 *** Caught signal (Aborted) **
>> in thread 7f4d931b5b80 thread_name:ceph-osd
>> 
>> So I grabbed 2982809 and 2982808 and am setting them.
>> 
>> Checking if the osds will start with that.
>> 
>> -- dan
>> 
>> 
>> 
>>> On Thu, Feb 20, 2020 at 12:47 PM Wido den Hollander  wrote:
>>> On 2/20/20 12:40 PM, Dan van der Ster wrote:
 Hi,
 
 My turn.
 We suddenly have a big outage which is similar/identical to
 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036519.html
 
 Some of the osds are runnable, but most crash when they start -- crc
 error in osdmap::decode.
 I'm able to extract an osd map from a good osd and it decodes well
 with osdmaptool:
 
 # ceph-objectstore-tool --op get-osdmap --data-path
 /var/lib/ceph/osd/ceph-680/ --file osd.680.map
 
 But when I try on one of the bad osds I get:
 
 # ceph-objectstore-tool --op get-osdmap --data-path
 /var/lib/ceph/osd/ceph-666/ --file osd.666.map
 terminate called after throwing an instance of 
 'ceph::buffer::malformed_input'
  what():  buffer::malformed_input: bad crc, actual 822724616 !=
 expected 2334082500
 *** Caught signal (Aborted) **
 in thread 7f600aa42d00 thread_name:ceph-objectstor
 ceph version 13.2.7 (71bd687b6e8b9424dd5e5974ed542595d8977416) mimic 
 (stable)
 1: (()+0xf5f0) [0x7f5ffefc45f0]
 2: (gsignal()+0x37) [0x7f5ffdbae337]
 3: (abort()+0x148) [0x7f5ffdbafa28]
 4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f5ffe4be7d5]
 5: (()+0x5e746) [0x7f5ffe4bc746]
 6: (()+0x5e773) [0x7f5ffe4bc773]
 7: (()+0x5e993) [0x7f5ffe4bc993]
 8: (OSDMap::decode(ceph::buffer::list::iterator&)+0x160e) [0x7f6000f4168e]
 9: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f6000f42e31]
 10: (get_osdmap(ObjectStore*, unsigned int, OSDMap&,
 ceph::buffer::list&)+0x1d0) [0x55d30a489190]
 11: (main()+0x5340) [0x55d30a3aae70]
 12: (__libc_start_main()+0xf5) [0x7f5ffdb9a505]
 13: (()+0x3a0f40) [0x55d30a483f40]
 Aborted (core dumped)
 
 
 
 I think I want to inject the osdmap, but can't:
 
 # ceph-objectstore-tool --op set-osdmap --data-path
 /var/lib/ceph/osd/ceph-666/ --file osd.680.map
 osdmap (#-1:b65b78ab:::osdmap.2983572:0#) does not exist.
 
>>> 
>>> Have you tried to list which epoch osd.680 is at and which one osd.666
>>> is at? And which one the MONs are at?
>>> 
>>> Maybe there is a difference there?
>>> 
>>> Wido
>>> 
 
 How do I do this?
 
 Thanks for any help!
 
 dan
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: osdmap::decode crc error -- 13.2.7 -- most osds down

2020-02-20 Thread Dan van der Ster

On Thu, Feb 20, 2020 at 9:20 PM Wido den Hollander  wrote:
>
> > Op 20 feb. 2020 om 19:54 heeft Dan van der Ster  het 
> > volgende geschreven:
> >
> > For those following along, the issue is here:
> > https://tracker.ceph.com/issues/39525#note-6
> >
> > Somehow single bits are getting flipped in the osdmaps -- maybe
> > network, maybe memory, maybe a bug.
> >
>
> Weird!
>
> But I did see things like this happen before. This was under Hammer and Jewel 
> where I needed to these kind of things. Crashes looked very similar.
>
> > To get an osd starting, we have to extract the full osdmap from the
> > mon, and set it into the crashing osd. So for the osd.666:
> >
> > # ceph osd getmap 2982809 -o 2982809
> > # ceph-objectstore-tool --op set-osdmap --data-path
> > /var/lib/ceph/osd/ceph-666/ --file 2982809
> >
> > Some osds had multiple corrupted osdmaps -- so we scriptified the above.
>
> Were those corrupted onces in sequence?

Yes, usually 1 to 3 osdmaps corrupted in sequence.

There's a theory that this might be related
(https://tracker.ceph.com/issues/43903)
but the backports to mimic or even nautilus look challenging.

-- dan

>
> > As of now our PGs are all active, but we're not confident that this
>
>
> Awesome!
>
> Wido
>
> > won't happen again (without knowing why the maps were corrupting).
> >
> > Thanks to all who helped!
> >
> > dan
> >
> >
> >
> >> On Thu, Feb 20, 2020 at 1:01 PM Dan van der Ster  
> >> wrote:
> >>
> >> 680 is epoch 2983572
> >> 666 crashes at 2982809 or 2982808
> >>
> >>  -407> 2020-02-20 11:20:24.960 7f4d931b5b80 10 osd.666 0 add_map_bl
> >> 2982809 612069 bytes
> >>  -407> 2020-02-20 11:20:24.966 7f4d931b5b80 -1 *** Caught signal (Aborted) 
> >> **
> >> in thread 7f4d931b5b80 thread_name:ceph-osd
> >>
> >> So I grabbed 2982809 and 2982808 and am setting them.
> >>
> >> Checking if the osds will start with that.
> >>
> >> -- dan
> >>
> >>
> >>
> >>> On Thu, Feb 20, 2020 at 12:47 PM Wido den Hollander  wrote:
> >>> On 2/20/20 12:40 PM, Dan van der Ster wrote:
>  Hi,
> 
>  My turn.
>  We suddenly have a big outage which is similar/identical to
>  http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036519.html
> 
>  Some of the osds are runnable, but most crash when they start -- crc
>  error in osdmap::decode.
>  I'm able to extract an osd map from a good osd and it decodes well
>  with osdmaptool:
> 
>  # ceph-objectstore-tool --op get-osdmap --data-path
>  /var/lib/ceph/osd/ceph-680/ --file osd.680.map
> 
>  But when I try on one of the bad osds I get:
> 
>  # ceph-objectstore-tool --op get-osdmap --data-path
>  /var/lib/ceph/osd/ceph-666/ --file osd.666.map
>  terminate called after throwing an instance of 
>  'ceph::buffer::malformed_input'
>   what():  buffer::malformed_input: bad crc, actual 822724616 !=
>  expected 2334082500
>  *** Caught signal (Aborted) **
>  in thread 7f600aa42d00 thread_name:ceph-objectstor
>  ceph version 13.2.7 (71bd687b6e8b9424dd5e5974ed542595d8977416) mimic 
>  (stable)
>  1: (()+0xf5f0) [0x7f5ffefc45f0]
>  2: (gsignal()+0x37) [0x7f5ffdbae337]
>  3: (abort()+0x148) [0x7f5ffdbafa28]
>  4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f5ffe4be7d5]
>  5: (()+0x5e746) [0x7f5ffe4bc746]
>  6: (()+0x5e773) [0x7f5ffe4bc773]
>  7: (()+0x5e993) [0x7f5ffe4bc993]
>  8: (OSDMap::decode(ceph::buffer::list::iterator&)+0x160e) 
>  [0x7f6000f4168e]
>  9: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f6000f42e31]
>  10: (get_osdmap(ObjectStore*, unsigned int, OSDMap&,
>  ceph::buffer::list&)+0x1d0) [0x55d30a489190]
>  11: (main()+0x5340) [0x55d30a3aae70]
>  12: (__libc_start_main()+0xf5) [0x7f5ffdb9a505]
>  13: (()+0x3a0f40) [0x55d30a483f40]
>  Aborted (core dumped)
> 
> 
> 
>  I think I want to inject the osdmap, but can't:
> 
>  # ceph-objectstore-tool --op set-osdmap --data-path
>  /var/lib/ceph/osd/ceph-666/ --file osd.680.map
>  osdmap (#-1:b65b78ab:::osdmap.2983572:0#) does not exist.
> 
> >>>
> >>> Have you tried to list which epoch osd.680 is at and which one osd.666
> >>> is at? And which one the MONs are at?
> >>>
> >>> Maybe there is a difference there?
> >>>
> >>> Wido
> >>>
> 
>  How do I do this?
> 
>  Thanks for any help!
> 
>  dan
>  ___
>  ceph-users mailing list -- ceph-users@ceph.io
>  To unsubscribe send an email to ceph-users-le...@ceph.io
> 
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2020-02-20 Thread Dan van der Ster

Hi Troy,

Looks like we hit the same today -- Sage posted some observations
here: https://tracker.ceph.com/issues/39525#note-6

Did it happen again in your cluster?

Cheers, Dan



On Tue, Aug 20, 2019 at 2:18 AM Troy Ablan  wrote:
>
> While I'm still unsure how this happened, this is what was done to solve
> this.
>
> Started OSD in foreground with debug 10, watched for the most recent
> osdmap epoch mentioned before abort().  For example, if it mentioned
> that it just tried to load 80896 and then crashed
>
> # ceph osd getmap -o osdmap.80896 80896
> # ceph-objectstore-tool --op set-osdmap --data-path
> /var/lib/ceph/osd/ceph-77/ --file osdmap.80896
>
> Then I restarted the osd in foreground/debug, and repeated for the next
> osdmap epoch until it got past the first few seconds.  This process
> worked for all but two OSDs.  For the ones that succeeded I'd ^C and
> then start the osd via systemd
>
> For the remaining two, it would try loading the incremental map and then
> crash.  I had presence of mind to make dd images of every OSD before
> starting this process, so I reverted these two to the state before
> injecting the osdmaps.
>
> I then injected the last 15 or so epochs of the osdmap in sequential
> order before starting the osd, with success.
>
> This leads me to believe that the step-wise injection didn't work
> because the osd had more subtle corruption that it got past, but it was
> confused when it requested the next incremental delta.
>
> Thanks again to Brad/badone for the guidance!
>
> Tracker issue updated.
>
> Here's the closing IRC dialogue re this issue (UTC-0700)
>
> 2019-08-19 16:27:42 < MooingLemur> badone: I appreciate you reaching out
> yesterday, you've helped a ton, twice now :)  I'm still concerned
> because I don't know how this happened.  I'll feel better once
> everything's active+clean, but it's all at least active.
>
> 2019-08-19 16:30:28 < badone> MooingLemur: I had a quick discussion with
> Josh earlier and he shares my opinion this is likely somehow related to
> these drives or perhaps controllers, or at least specific to these machines
>
> 2019-08-19 16:31:04 < badone> however, there is a possibility you are
> seeing some extremely rare race that no one up to this point has seen before
>
> 2019-08-19 16:31:20 < badone> that is less likely though
>
> 2019-08-19 16:32:50 < badone> the osd read the osdmap over the wire
> successfully but wrote it out to disk in a format that it could not then
> read back in (unlikely) or...
>
> 2019-08-19 16:33:21 < badone> the map "changed" after it had been
> written to disk
>
> 2019-08-19 16:33:46 < badone> the second is considered most likely by us
> but I recognise you may not share that opinion
> ___
> ceph-users mailing list
> ceph-us...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2020-02-20 Thread Troy Ablan


Dan,

Yes, I have had this happen several times since, but fortunately the 
last couple of times has only happened to one or two OSDs at a time so 
it didn't take down entire pools.  Remedy has been the same.


I had been holding off on too much further investigation because I 
thought the source of the issue may have been some old hardware 
gremlins, and we're waiting on some new hardware.


-Troy


On 2/20/20 1:40 PM, Dan van der Ster wrote:

Hi Troy,

Looks like we hit the same today -- Sage posted some observations
here: https://tracker.ceph.com/issues/39525#note-6

Did it happen again in your cluster?

Cheers, Dan



On Tue, Aug 20, 2019 at 2:18 AM Troy Ablan  wrote:


While I'm still unsure how this happened, this is what was done to solve
this.

Started OSD in foreground with debug 10, watched for the most recent
osdmap epoch mentioned before abort().  For example, if it mentioned
that it just tried to load 80896 and then crashed

# ceph osd getmap -o osdmap.80896 80896
# ceph-objectstore-tool --op set-osdmap --data-path
/var/lib/ceph/osd/ceph-77/ --file osdmap.80896

Then I restarted the osd in foreground/debug, and repeated for the next
osdmap epoch until it got past the first few seconds.  This process
worked for all but two OSDs.  For the ones that succeeded I'd ^C and
then start the osd via systemd

For the remaining two, it would try loading the incremental map and then
crash.  I had presence of mind to make dd images of every OSD before
starting this process, so I reverted these two to the state before
injecting the osdmaps.

I then injected the last 15 or so epochs of the osdmap in sequential
order before starting the osd, with success.

This leads me to believe that the step-wise injection didn't work
because the osd had more subtle corruption that it got past, but it was
confused when it requested the next incremental delta.

Thanks again to Brad/badone for the guidance!

Tracker issue updated.

Here's the closing IRC dialogue re this issue (UTC-0700)

2019-08-19 16:27:42 < MooingLemur> badone: I appreciate you reaching out
yesterday, you've helped a ton, twice now :)  I'm still concerned
because I don't know how this happened.  I'll feel better once
everything's active+clean, but it's all at least active.

2019-08-19 16:30:28 < badone> MooingLemur: I had a quick discussion with
Josh earlier and he shares my opinion this is likely somehow related to
these drives or perhaps controllers, or at least specific to these machines

2019-08-19 16:31:04 < badone> however, there is a possibility you are
seeing some extremely rare race that no one up to this point has seen before

2019-08-19 16:31:20 < badone> that is less likely though

2019-08-19 16:32:50 < badone> the osd read the osdmap over the wire
successfully but wrote it out to disk in a format that it could not then
read back in (unlikely) or...

2019-08-19 16:33:21 < badone> the map "changed" after it had been
written to disk

2019-08-19 16:33:46 < badone> the second is considered most likely by us
but I recognise you may not share that opinion
___
ceph-users mailing list
ceph-us...@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2020-02-20 Thread Dan van der Ster

Thanks Troy for the quick response.
Are you still running mimic on that cluster? Seeing the crashes in nautilus too?

Our cluster is also quite old -- so it could very well be memory or
network gremlins.

Cheers, Dan

On Thu, Feb 20, 2020 at 10:11 PM Troy Ablan  wrote:
>
> Dan,
>
> Yes, I have had this happen several times since, but fortunately the
> last couple of times has only happened to one or two OSDs at a time so
> it didn't take down entire pools.  Remedy has been the same.
>
> I had been holding off on too much further investigation because I
> thought the source of the issue may have been some old hardware
> gremlins, and we're waiting on some new hardware.
>
> -Troy
>
>
> On 2/20/20 1:40 PM, Dan van der Ster wrote:
> > Hi Troy,
> >
> > Looks like we hit the same today -- Sage posted some observations
> > here: https://tracker.ceph.com/issues/39525#note-6
> >
> > Did it happen again in your cluster?
> >
> > Cheers, Dan
> >
> >
> >
> > On Tue, Aug 20, 2019 at 2:18 AM Troy Ablan  wrote:
> >>
> >> While I'm still unsure how this happened, this is what was done to solve
> >> this.
> >>
> >> Started OSD in foreground with debug 10, watched for the most recent
> >> osdmap epoch mentioned before abort().  For example, if it mentioned
> >> that it just tried to load 80896 and then crashed
> >>
> >> # ceph osd getmap -o osdmap.80896 80896
> >> # ceph-objectstore-tool --op set-osdmap --data-path
> >> /var/lib/ceph/osd/ceph-77/ --file osdmap.80896
> >>
> >> Then I restarted the osd in foreground/debug, and repeated for the next
> >> osdmap epoch until it got past the first few seconds.  This process
> >> worked for all but two OSDs.  For the ones that succeeded I'd ^C and
> >> then start the osd via systemd
> >>
> >> For the remaining two, it would try loading the incremental map and then
> >> crash.  I had presence of mind to make dd images of every OSD before
> >> starting this process, so I reverted these two to the state before
> >> injecting the osdmaps.
> >>
> >> I then injected the last 15 or so epochs of the osdmap in sequential
> >> order before starting the osd, with success.
> >>
> >> This leads me to believe that the step-wise injection didn't work
> >> because the osd had more subtle corruption that it got past, but it was
> >> confused when it requested the next incremental delta.
> >>
> >> Thanks again to Brad/badone for the guidance!
> >>
> >> Tracker issue updated.
> >>
> >> Here's the closing IRC dialogue re this issue (UTC-0700)
> >>
> >> 2019-08-19 16:27:42 < MooingLemur> badone: I appreciate you reaching out
> >> yesterday, you've helped a ton, twice now :)  I'm still concerned
> >> because I don't know how this happened.  I'll feel better once
> >> everything's active+clean, but it's all at least active.
> >>
> >> 2019-08-19 16:30:28 < badone> MooingLemur: I had a quick discussion with
> >> Josh earlier and he shares my opinion this is likely somehow related to
> >> these drives or perhaps controllers, or at least specific to these machines
> >>
> >> 2019-08-19 16:31:04 < badone> however, there is a possibility you are
> >> seeing some extremely rare race that no one up to this point has seen 
> >> before
> >>
> >> 2019-08-19 16:31:20 < badone> that is less likely though
> >>
> >> 2019-08-19 16:32:50 < badone> the osd read the osdmap over the wire
> >> successfully but wrote it out to disk in a format that it could not then
> >> read back in (unlikely) or...
> >>
> >> 2019-08-19 16:33:21 < badone> the map "changed" after it had been
> >> written to disk
> >>
> >> 2019-08-19 16:33:46 < badone> the second is considered most likely by us
> >> but I recognise you may not share that opinion
> >> ___
> >> ceph-users mailing list
> >> ceph-us...@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2020-02-20 Thread Troy Ablan

I hope I don't sound too happy to hear that you've run into this same 
problem, but still I'm glad to see that it's not just a one-off problem 
with us. :)


We're still running Mimic.  I haven't yet deployed Nautilus anywhere.

Thanks
-Troy

On 2/20/20 2:14 PM, Dan van der Ster wrote:

Thanks Troy for the quick response.
Are you still running mimic on that cluster? Seeing the crashes in nautilus too?

Our cluster is also quite old -- so it could very well be memory or
network gremlins.

Cheers, Dan

On Thu, Feb 20, 2020 at 10:11 PM Troy Ablan  wrote:


Dan,

Yes, I have had this happen several times since, but fortunately the
last couple of times has only happened to one or two OSDs at a time so
it didn't take down entire pools.  Remedy has been the same.

I had been holding off on too much further investigation because I
thought the source of the issue may have been some old hardware
gremlins, and we're waiting on some new hardware.

-Troy


On 2/20/20 1:40 PM, Dan van der Ster wrote:

Hi Troy,

Looks like we hit the same today -- Sage posted some observations
here: https://tracker.ceph.com/issues/39525#note-6

Did it happen again in your cluster?

Cheers, Dan



On Tue, Aug 20, 2019 at 2:18 AM Troy Ablan  wrote:


While I'm still unsure how this happened, this is what was done to solve
this.

Started OSD in foreground with debug 10, watched for the most recent
osdmap epoch mentioned before abort().  For example, if it mentioned
that it just tried to load 80896 and then crashed

# ceph osd getmap -o osdmap.80896 80896
# ceph-objectstore-tool --op set-osdmap --data-path
/var/lib/ceph/osd/ceph-77/ --file osdmap.80896

Then I restarted the osd in foreground/debug, and repeated for the next
osdmap epoch until it got past the first few seconds.  This process
worked for all but two OSDs.  For the ones that succeeded I'd ^C and
then start the osd via systemd

For the remaining two, it would try loading the incremental map and then
crash.  I had presence of mind to make dd images of every OSD before
starting this process, so I reverted these two to the state before
injecting the osdmaps.

I then injected the last 15 or so epochs of the osdmap in sequential
order before starting the osd, with success.

This leads me to believe that the step-wise injection didn't work
because the osd had more subtle corruption that it got past, but it was
confused when it requested the next incremental delta.

Thanks again to Brad/badone for the guidance!

Tracker issue updated.

Here's the closing IRC dialogue re this issue (UTC-0700)

2019-08-19 16:27:42 < MooingLemur> badone: I appreciate you reaching out
yesterday, you've helped a ton, twice now :)  I'm still concerned
because I don't know how this happened.  I'll feel better once
everything's active+clean, but it's all at least active.

2019-08-19 16:30:28 < badone> MooingLemur: I had a quick discussion with
Josh earlier and he shares my opinion this is likely somehow related to
these drives or perhaps controllers, or at least specific to these machines

2019-08-19 16:31:04 < badone> however, there is a possibility you are
seeing some extremely rare race that no one up to this point has seen before

2019-08-19 16:31:20 < badone> that is less likely though

2019-08-19 16:32:50 < badone> the osd read the osdmap over the wire
successfully but wrote it out to disk in a format that it could not then
read back in (unlikely) or...

2019-08-19 16:33:21 < badone> the map "changed" after it had been
written to disk

2019-08-19 16:33:46 < badone> the second is considered most likely by us
but I recognise you may not share that opinion
___
ceph-users mailing list
ceph-us...@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2020-02-20 Thread Dan van der Ster

Another thing... in your thread that you said that only the *SSDs* in
your cluster had crashed, but not the HDDs.
Both SSDs and HDDs were bluestore? Did the hdds ever crash subsequently?
Which OS/kernel do you run? We're CentOS 7 with quite some uptime.


On Thu, Feb 20, 2020 at 10:29 PM Troy Ablan  wrote:
>
> I hope I don't sound too happy to hear that you've run into this same
> problem, but still I'm glad to see that it's not just a one-off problem
> with us. :)
>
> We're still running Mimic.  I haven't yet deployed Nautilus anywhere.
>
> Thanks
> -Troy
>
> On 2/20/20 2:14 PM, Dan van der Ster wrote:
> > Thanks Troy for the quick response.
> > Are you still running mimic on that cluster? Seeing the crashes in nautilus 
> > too?
> >
> > Our cluster is also quite old -- so it could very well be memory or
> > network gremlins.
> >
> > Cheers, Dan
> >
> > On Thu, Feb 20, 2020 at 10:11 PM Troy Ablan  wrote:
> >>
> >> Dan,
> >>
> >> Yes, I have had this happen several times since, but fortunately the
> >> last couple of times has only happened to one or two OSDs at a time so
> >> it didn't take down entire pools.  Remedy has been the same.
> >>
> >> I had been holding off on too much further investigation because I
> >> thought the source of the issue may have been some old hardware
> >> gremlins, and we're waiting on some new hardware.
> >>
> >> -Troy
> >>
> >>
> >> On 2/20/20 1:40 PM, Dan van der Ster wrote:
> >>> Hi Troy,
> >>>
> >>> Looks like we hit the same today -- Sage posted some observations
> >>> here: https://tracker.ceph.com/issues/39525#note-6
> >>>
> >>> Did it happen again in your cluster?
> >>>
> >>> Cheers, Dan
> >>>
> >>>
> >>>
> >>> On Tue, Aug 20, 2019 at 2:18 AM Troy Ablan  wrote:
> 
>  While I'm still unsure how this happened, this is what was done to solve
>  this.
> 
>  Started OSD in foreground with debug 10, watched for the most recent
>  osdmap epoch mentioned before abort().  For example, if it mentioned
>  that it just tried to load 80896 and then crashed
> 
>  # ceph osd getmap -o osdmap.80896 80896
>  # ceph-objectstore-tool --op set-osdmap --data-path
>  /var/lib/ceph/osd/ceph-77/ --file osdmap.80896
> 
>  Then I restarted the osd in foreground/debug, and repeated for the next
>  osdmap epoch until it got past the first few seconds.  This process
>  worked for all but two OSDs.  For the ones that succeeded I'd ^C and
>  then start the osd via systemd
> 
>  For the remaining two, it would try loading the incremental map and then
>  crash.  I had presence of mind to make dd images of every OSD before
>  starting this process, so I reverted these two to the state before
>  injecting the osdmaps.
> 
>  I then injected the last 15 or so epochs of the osdmap in sequential
>  order before starting the osd, with success.
> 
>  This leads me to believe that the step-wise injection didn't work
>  because the osd had more subtle corruption that it got past, but it was
>  confused when it requested the next incremental delta.
> 
>  Thanks again to Brad/badone for the guidance!
> 
>  Tracker issue updated.
> 
>  Here's the closing IRC dialogue re this issue (UTC-0700)
> 
>  2019-08-19 16:27:42 < MooingLemur> badone: I appreciate you reaching out
>  yesterday, you've helped a ton, twice now :)  I'm still concerned
>  because I don't know how this happened.  I'll feel better once
>  everything's active+clean, but it's all at least active.
> 
>  2019-08-19 16:30:28 < badone> MooingLemur: I had a quick discussion with
>  Josh earlier and he shares my opinion this is likely somehow related to
>  these drives or perhaps controllers, or at least specific to these 
>  machines
> 
>  2019-08-19 16:31:04 < badone> however, there is a possibility you are
>  seeing some extremely rare race that no one up to this point has seen 
>  before
> 
>  2019-08-19 16:31:20 < badone> that is less likely though
> 
>  2019-08-19 16:32:50 < badone> the osd read the osdmap over the wire
>  successfully but wrote it out to disk in a format that it could not then
>  read back in (unlikely) or...
> 
>  2019-08-19 16:33:21 < badone> the map "changed" after it had been
>  written to disk
> 
>  2019-08-19 16:33:46 < badone> the second is considered most likely by us
>  but I recognise you may not share that opinion
>  ___
>  ceph-users mailing list
>  ceph-us...@lists.ceph.com
>  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2020-02-20 Thread Troy Ablan


Dan,

This has happened to HDDs also, and nvme most recently.  CentOS 7.7, 
usually the kernel is within 6 months of current updates.  We try to 
stay relatively up to date.


-Troy

On 2/20/20 5:28 PM, Dan van der Ster wrote:

Another thing... in your thread that you said that only the *SSDs* in
your cluster had crashed, but not the HDDs.
Both SSDs and HDDs were bluestore? Did the hdds ever crash subsequently?
Which OS/kernel do you run? We're CentOS 7 with quite some uptime.


On Thu, Feb 20, 2020 at 10:29 PM Troy Ablan  wrote:


I hope I don't sound too happy to hear that you've run into this same
problem, but still I'm glad to see that it's not just a one-off problem
with us. :)

We're still running Mimic.  I haven't yet deployed Nautilus anywhere.

Thanks
-Troy

On 2/20/20 2:14 PM, Dan van der Ster wrote:

Thanks Troy for the quick response.
Are you still running mimic on that cluster? Seeing the crashes in nautilus too?

Our cluster is also quite old -- so it could very well be memory or
network gremlins.

Cheers, Dan

On Thu, Feb 20, 2020 at 10:11 PM Troy Ablan  wrote:


Dan,

Yes, I have had this happen several times since, but fortunately the
last couple of times has only happened to one or two OSDs at a time so
it didn't take down entire pools.  Remedy has been the same.

I had been holding off on too much further investigation because I
thought the source of the issue may have been some old hardware
gremlins, and we're waiting on some new hardware.

-Troy


On 2/20/20 1:40 PM, Dan van der Ster wrote:

Hi Troy,

Looks like we hit the same today -- Sage posted some observations
here: https://tracker.ceph.com/issues/39525#note-6

Did it happen again in your cluster?

Cheers, Dan



On Tue, Aug 20, 2019 at 2:18 AM Troy Ablan  wrote:


While I'm still unsure how this happened, this is what was done to solve
this.

Started OSD in foreground with debug 10, watched for the most recent
osdmap epoch mentioned before abort().  For example, if it mentioned
that it just tried to load 80896 and then crashed

# ceph osd getmap -o osdmap.80896 80896
# ceph-objectstore-tool --op set-osdmap --data-path
/var/lib/ceph/osd/ceph-77/ --file osdmap.80896

Then I restarted the osd in foreground/debug, and repeated for the next
osdmap epoch until it got past the first few seconds.  This process
worked for all but two OSDs.  For the ones that succeeded I'd ^C and
then start the osd via systemd

For the remaining two, it would try loading the incremental map and then
crash.  I had presence of mind to make dd images of every OSD before
starting this process, so I reverted these two to the state before
injecting the osdmaps.

I then injected the last 15 or so epochs of the osdmap in sequential
order before starting the osd, with success.

This leads me to believe that the step-wise injection didn't work
because the osd had more subtle corruption that it got past, but it was
confused when it requested the next incremental delta.

Thanks again to Brad/badone for the guidance!

Tracker issue updated.

Here's the closing IRC dialogue re this issue (UTC-0700)

2019-08-19 16:27:42 < MooingLemur> badone: I appreciate you reaching out
yesterday, you've helped a ton, twice now :)  I'm still concerned
because I don't know how this happened.  I'll feel better once
everything's active+clean, but it's all at least active.

2019-08-19 16:30:28 < badone> MooingLemur: I had a quick discussion with
Josh earlier and he shares my opinion this is likely somehow related to
these drives or perhaps controllers, or at least specific to these machines

2019-08-19 16:31:04 < badone> however, there is a possibility you are
seeing some extremely rare race that no one up to this point has seen before

2019-08-19 16:31:20 < badone> that is less likely though

2019-08-19 16:32:50 < badone> the osd read the osdmap over the wire
successfully but wrote it out to disk in a format that it could not then
read back in (unlikely) or...

2019-08-19 16:33:21 < badone> the map "changed" after it had been
written to disk

2019-08-19 16:33:46 < badone> the second is considered most likely by us
but I recognise you may not share that opinion
___
ceph-users mailing list
ceph-us...@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Module 'telemetry' has experienced an error

2020-02-20 Thread alexander . v . litvak

This evening I was awakened by an error message 

 cluster:
id: 9b4468b7-5bf2-4964-8aec-4b2f4bee87ad
health: HEALTH_ERR
Module 'telemetry' has failed: ('Connection aborted.', error(101, 
'Network is unreachable'))
 
  services:

I have not seen any other problems with anything else on the cluster.  I 
disabled and enabled the telemetry module and health returned to OK status.  
Any ideas on what could cause the issue?  As far as I understand, telemetry is 
a module that sends messages to an external ceph server outside of the network.

Thank you for any advice,
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] osdmap::decode crc error -- 13.2.7 -- most osds down

[ceph-users] Re: osdmap::decode crc error -- 13.2.7 -- most osds down

[ceph-users] Re: osdmap::decode crc error -- 13.2.7 -- most osds down

[ceph-users] Re: osdmap::decode crc error -- 13.2.7 -- most osds down

[ceph-users] Re: osdmap::decode crc error -- 13.2.7 -- most osds down

[ceph-users] Re: osdmap::decode crc error -- 13.2.7 -- most osds down

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

[ceph-users] Module 'telemetry' has experienced an error

13 matches

Site Navigation

Mail list logo

Footer information