On 5/7/14 15:33 , Dimitri Maziuk wrote:
On 05/07/2014 04:11 PM, Craig Lewis wrote:
On 5/7/14 13:40 , Sergey Malinin wrote:
Check dmesg and SMART data on both nodes. This behaviour is similar to
failing hdd.
It does sound like a failing disk... but there's nothing in dmesg, and
smartmontools
On 5/7/2014 7:35 PM, Craig Lewis wrote:
Because of the very low recovery parameters, there's on a single
backfill running. `iostat -dmx 5 5` did report 100% util on the osd
that is backfilling, but I expected that. Once backfilling moves on to
a new osd, the 100% util follows the backfill oper
On 5/7/14 15:33 , Dimitri Maziuk wrote:
On 05/07/2014 04:11 PM, Craig Lewis wrote:
On 5/7/14 13:40 , Sergey Malinin wrote:
Check dmesg and SMART data on both nodes. This behaviour is similar to
failing hdd.
It does sound like a failing disk... but there's nothing in dmesg, and
smartmontools
On 05/07/2014 04:11 PM, Craig Lewis wrote:
> On 5/7/14 13:40 , Sergey Malinin wrote:
>> Check dmesg and SMART data on both nodes. This behaviour is similar to
>> failing hdd.
>>
>>
>
> It does sound like a failing disk... but there's nothing in dmesg, and
> smartmontools hasn't emailed me about a
On 5/7/14 13:40 , Sergey Malinin wrote:
Check dmesg and SMART data on both nodes. This behaviour is similar to
failing hdd.
It does sound like a failing disk... but there's nothing in dmesg, and
smartmontools hasn't emailed me about a failing disk. The same thing is
happening to more than
Check dmesg and SMART data on both nodes. This behaviour is similar to failing
hdd.
On Wednesday, May 7, 2014 at 23:28, Craig Lewis wrote:
> On 5/7/14 13:15 , Sergey Malinin wrote:
> > Is there anything unusual in dmesg at osd.5?
>
> Nothing in dmesg, but ceph-osd.5.log has plenty. I've att
Is there anything unusual in dmesg at osd.5?
On Wednesday, May 7, 2014 at 23:09, Craig Lewis wrote:
> I already have osd_max_backfill = 1, and osd_recovery_op_priority = 1.
>
> osd_recovery_max_active is the default 15, so I'll give that a try... some
> OSDs timed out during the injectargs.
I already have osd_max_backfill = 1, and osd_recovery_op_priority = 1.
osd_recovery_max_active is the default 15, so I'll give that a try...
some OSDs timed out during the injectargs. I added it to ceph.conf, and
restarted them all.
I was running RadosGW-Agent, but it's down now. I disable
Craig,
I suspect the disks in question are seeking constantly and the spindle
contention is causing significant latency. A strategy of throttling
backfill/recovery and reducing client traffic tends to work for me.
1) You should make sure recovery and backfill are throttled:
ceph tell osd.* in
The 5 OSDs that are down have all been kicked out for being
unresponsive. The 5 OSDs are getting kicked faster than they can
complete the recovery+backfill. The number of degraded PGs is growing
over time.
root@ceph0c:~# ceph -w
cluster 1604ec7a-6ceb-42fc-8c68-0a7896c4e120
health HE
10 matches
Mail list logo