Hi all,
Recently, I deployed ceph orch ( pacific ) in my nodes with 5 mons 5 mgrs
238 osds and 5 rgw.
Yesterday , 4 osds went out and 2 rgws down. So, i restart whole rgw by
"ceph orch restart rgw.rgw". After two minutes , the whole rgw nodes goes
down.
Then I turned up the 4 osds and also waite
Hi all,
I have one critical issue in my prod cluster. When the customer's data
comes from 600 MiB .
My Osds are down *8 to 20 from 238* . Then I manually up my osds . After a
few minutes, my all rgw crashes.
We did some troubleshooting but nothing works. When we upgrade ceph to
17.2.0. to 17.2.1
FYI
On Sat, Sep 10, 2022 at 11:23 AM Monish Selvaraj
wrote:
> Hi all,
>
> I have one critical issue in my prod cluster. When the customer's data
> comes from 600 MiB .
>
> My Osds are down *8 to 20 from 238* . Then I manually up my osds . After
> a few minutes, my a
On Sat, Sep 10, 2022 at 11:25 AM Monish Selvaraj
wrote:
> FYI
>
> On Sat, Sep 10, 2022 at 11:23 AM Monish Selvaraj
> wrote:
>
>> Hi all,
>>
>> I have one critical issue in my prod cluster. When the customer's data
>> comes from 600 MiB .
>&g
s the cluster a new installation
> with cephadm or an older cluster upgraded to Quincy?
>
> Zitat von Monish Selvaraj :
>
> > Hi all,
> >
> > I have one critical issue in my prod cluster. When the customer's data
> > comes from 600 MiB .
> >
> >
would RGW
> prevent from starting? I’m assuming that if you fix your OSDs the RGWs
> would start working again. But then again, we still don’t know
> anything about the current situation.
>
> Zitat von Monish Selvaraj :
>
> > Hi Eugen,
> >
> > Below is the log
; I dont know why it is happening. But maybe the rgw are running in
> separate
> > machines. This causes the issue ?
>
> I don't know how that should
>
> Zitat von Monish Selvaraj :
>
> > Hi Eugen,
> >
> > Yes, I have an inactive pgs when the osd goes down.
he mailing list archives for that,
> setting 'ceph osd set nodown' might help during the migration. But are
> the OSDs fully saturated ('iostat -xmt /dev/sd* 1')? If updating helps
> just stay on that version and maybe report a tracker issue with your
> findings.
>
to sustain the failure
> of three hosts without client impact, but if multiple OSDs across more
> hosts fail (holding PGs of the same pool(s)) you would have inactive
> PGs as you already reported.
>
> Zitat von Monish Selvaraj :
>
> > Hi Eugen,
> >
> > Thanks for
:
> As I already said, it's possible that your inactive PGs prevent the
> RGWs from starting. You can turn on debug logs for the RGWs, maybe
> they reveal more.
>
> Zitat von Monish Selvaraj :
>
> > Hi Eugen,
> >
> > The OSD fails because of RAM/CPU overloade
to increase
> osd_recovery_max_active and osd_max_backfills. What are the current
> values in your cluster?
>
>
> Zitat von Monish Selvaraj :
>
> > Hi,
> >
> > Our ceph cluster consists of 20 hosts and 240 osds.
> >
> > We used the erasure-coded pool
11 matches
Mail list logo