Re: [ceph-users] OSDs are flapping and marked down wrongly

2016-10-17 Thread Somnath Roy
: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org Subject: Re: [ceph-users] OSDs are flapping and marked down wrongly On Mon, Oct 17, 2016 at 3:16 PM, Somnath Roy wrote: > Hi Sage et. al, > > I know this issue is reported number of times in community and attributed to

Re: [ceph-users] OSDs are flapping and marked down wrongly

2016-10-17 Thread Wei Jin
On Mon, Oct 17, 2016 at 3:16 PM, Somnath Roy wrote: > Hi Sage et. al, > > I know this issue is reported number of times in community and attributed to > either network issue or unresponsive OSDs. > Recently, we are seeing this issue when our all SSD cluster (Jewel based) is > stressed with larg

Re: [ceph-users] OSDs are flapping and marked down wrongly

2016-10-17 Thread Pavan Rallabhandi
Regarding the mon_osd_min_down_reports I was looking at it recently, this could provide some insight https://github.com/ceph/ceph/commit/0269a0c17723fd3e22738f7495fe017225b924a4 Thanks! On 10/17/16, 1:36 PM, "ceph-users on behalf of Somnath Roy" wrote: Thanks Piotr, Wido for quick respon

Re: [ceph-users] OSDs are flapping and marked down wrongly

2016-10-17 Thread Somnath Roy
Thanks Piotr, Wido for quick response. @Wido , yes, I thought of trying with those values but I am seeing in the log messages at least 7 osds are reporting failure , so, didn't try. BTW, I found default mon_osd_min_down_reporters is 2 , not 1 and latest master is not having mon_osd_min_down_rep

Re: [ceph-users] OSDs are flapping and marked down wrongly

2016-10-17 Thread Wido den Hollander
> Op 17 oktober 2016 om 9:16 schreef Somnath Roy : > > > Hi Sage et. al, > > I know this issue is reported number of times in community and attributed to > either network issue or unresponsive OSDs. > Recently, we are seeing this issue when our all SSD cluster (Jewel based) is > stressed wit

[ceph-users] OSDs are flapping and marked down wrongly

2016-10-17 Thread Somnath Roy
Hi Sage et. al, I know this issue is reported number of times in community and attributed to either network issue or unresponsive OSDs. Recently, we are seeing this issue when our all SSD cluster (Jewel based) is stressed with large block size and very high QD. Lowering QD it is working just f