Re: [ceph-users] Whole cluster flapping

2018-08-28 Thread CUZA Frédéric
: Webert de Souza Lima ; CUZA Frédéric Cc : ceph-users Objet : RE: [ceph-users] Whole cluster flapping Hi again Frederic, It may be worth looking at a recovery sleep. osd recovery sleep Description: Time in seconds to sleep before next recovery or backfill op. Increasing this value will slow down

Re: [ceph-users] Whole cluster flapping

2018-08-08 Thread Will Marley
s On Behalf Of Webert de Souza Lima Sent: 08 August 2018 15:06 To: frederic.c...@sib.fr Cc: ceph-users Subject: Re: [ceph-users] Whole cluster flapping So your OSDs are really too busy to respond heartbeats. You'll be facing this for sometime until cluster loads get lower. I would set `ceph osd

Re: [ceph-users] Whole cluster flapping

2018-08-08 Thread Webert de Souza Lima
p is_healthy 'OSD::osd_op_tp thread 0x7fdabd897700' had > timed out after 90 > > > > (I update it to 90 instead of 15s) > > > > Regards, > > > > > > > > *De :* ceph-users *De la part de* > Webert de Souza Lima > *Envoyé :* 07 August

Re: [ceph-users] Whole cluster flapping

2018-08-08 Thread CUZA Frédéric
8 À : ceph-users Objet : Re: [ceph-users] Whole cluster flapping oops, my bad, you're right. I don't know much you can see but maybe you can dig around performance counters and see what's happening on those OSDs, try these: ~# ceph daemonperf osd.XX ~# ceph daemon osd.XX perf du

Re: [ceph-users] Whole cluster flapping

2018-08-07 Thread Webert de Souza Lima
a Lima > *Envoyé :* 07 August 2018 15:08 > *À :* ceph-users > *Objet :* Re: [ceph-users] Whole cluster flapping > > > > Frédéric, > > > > see if the number of objects is decreasing in the pool with `ceph df > [detail]` > > > > Regards, > > > >

Re: [ceph-users] Whole cluster flapping

2018-08-07 Thread CUZA Frédéric
Pool is already deleted and no longer present in stats. Regards, De : ceph-users De la part de Webert de Souza Lima Envoyé : 07 August 2018 15:08 À : ceph-users Objet : Re: [ceph-users] Whole cluster flapping Frédéric, see if the number of objects is decreasing in the pool with `ceph df

Re: [ceph-users] Whole cluster flapping

2018-08-07 Thread Webert de Souza Lima
> Regards, > > > > *De :* ceph-users *De la part de* > Webert de Souza Lima > *Envoyé :* 31 July 2018 16:25 > *À :* ceph-users > *Objet :* Re: [ceph-users] Whole cluster flapping > > > > The pool deletion might have triggered a lot of IO operations on the

Re: [ceph-users] Whole cluster flapping

2018-08-07 Thread CUZA Frédéric
. Regards, De : ceph-users De la part de Webert de Souza Lima Envoyé : 31 July 2018 16:25 À : ceph-users Objet : Re: [ceph-users] Whole cluster flapping The pool deletion might have triggered a lot of IO operations on the disks and the process might be too busy to respond to hearbeats, so the

Re: [ceph-users] Whole cluster flapping

2018-08-02 Thread CUZA Frédéric
hanks for all. Regards, De : Brent Kennedy Envoyé : 31 July 2018 23:36 À : CUZA Frédéric ; 'ceph-users' Objet : RE: [ceph-users] Whole cluster flapping I have had this happen during large data movements. Stopped happening after I went to 10Gb though(from 1Gb). What I had done is inj

Re: [ceph-users] Whole cluster flapping

2018-07-31 Thread Brent Kennedy
...@lists.ceph.com] On Behalf Of CUZA Frédéric Sent: Tuesday, July 31, 2018 5:06 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] Whole cluster flapping Hi Everyone, I just upgrade our cluster to Luminous 12.2.7 and I delete a quite large pool that we had (120 TB). Our cluster is made of 14 Nodes

Re: [ceph-users] Whole cluster flapping

2018-07-31 Thread Webert de Souza Lima
The pool deletion might have triggered a lot of IO operations on the disks and the process might be too busy to respond to hearbeats, so the mons mark them as down due to no response. Check also the OSD logs to see if they are actually crashing and restarting, and disk IO usage (i.e. iostat). Rega

[ceph-users] Whole cluster flapping

2018-07-31 Thread CUZA Frédéric
Hi Everyone, I just upgrade our cluster to Luminous 12.2.7 and I delete a quite large pool that we had (120 TB). Our cluster is made of 14 Nodes with each composed of 12 OSDs (1 HDD -> 1 OSD), we have SDD for journal. After I deleted the large pool my cluster started to flapping on all OSDs. Os