On 07 Sep 2014, at 04:47, Christian Balzer wrote:
> On Sat, 6 Sep 2014 19:47:13 +0200 Josef Johansson wrote:
>
>>
>> On 06 Sep 2014, at 19:37, Josef Johansson wrote:
>>
>>> Hi,
>>>
>>> Unfortunatly the journal tuning did not do much. That’s odd, because I
>>> don’t see much utilisation on O
On Sat, 6 Sep 2014 19:47:13 +0200 Josef Johansson wrote:
>
> On 06 Sep 2014, at 19:37, Josef Johansson wrote:
>
> > Hi,
> >
> > Unfortunatly the journal tuning did not do much. That’s odd, because I
> > don’t see much utilisation on OSDs themselves. Now this leads to a
> > network-issue betwee
On 06 Sep 2014, at 19:37, Josef Johansson wrote:
> Hi,
>
> Unfortunatly the journal tuning did not do much. That’s odd, because I don’t
> see much utilisation on OSDs themselves. Now this leads to a network-issue
> between the OSDs right?
>
To answer my own question. Restarted a bond and it
Hi,
Unfortunatly the journal tuning did not do much. That’s odd, because I don’t
see much utilisation on OSDs themselves. Now this leads to a network-issue
between the OSDs right?
On 06 Sep 2014, at 18:17, Josef Johansson wrote:
> Hi,
>
> On 06 Sep 2014, at 17:59, Christian Balzer wrote:
>
Hi,
On 06 Sep 2014, at 17:59, Christian Balzer wrote:
>
> Hello,
>
> On Sat, 6 Sep 2014 17:41:02 +0200 Josef Johansson wrote:
>
>> Hi,
>>
>> On 06 Sep 2014, at 17:27, Christian Balzer wrote:
>>
>>>
>>> Hello,
>>>
>>> On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote:
>>>
We m
Hi,
On 06 Sep 2014, at 18:05, Christian Balzer wrote:
>
> Hello,
>
> On Sat, 6 Sep 2014 17:52:59 +0200 Josef Johansson wrote:
>
>> Hi,
>>
>> Just realised that it could also be with a popularity bug as well and
>> lots a small traffic. And seeing that it’s fast it gets popular until it
>> hi
Hello,
On Sat, 6 Sep 2014 17:52:59 +0200 Josef Johansson wrote:
> Hi,
>
> Just realised that it could also be with a popularity bug as well and
> lots a small traffic. And seeing that it’s fast it gets popular until it
> hits the curb.
>
I don't think I ever heard the term "popularity bug" bef
Hello,
On Sat, 6 Sep 2014 17:41:02 +0200 Josef Johansson wrote:
> Hi,
>
> On 06 Sep 2014, at 17:27, Christian Balzer wrote:
>
> >
> > Hello,
> >
> > On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote:
> >
> >> We manage to go through the restore, but the performance degradation
> >> i
Hi,
Just realised that it could also be with a popularity bug as well and lots a
small traffic. And seeing that it’s fast it gets popular until it hits the curb.
I’m seeing this in the stats I think.
Linux 3.13-0.bpo.1-amd64 (osd1) 09/06/2014 _x86_64_(24 CPU)
09/06/2014 05
Hi,
On 06 Sep 2014, at 17:27, Christian Balzer wrote:
>
> Hello,
>
> On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote:
>
>> We manage to go through the restore, but the performance degradation is
>> still there.
>>
> Manifesting itself how?
>
Awful slow io on the VMs, and iowait, it’
Hello,
On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote:
> We manage to go through the restore, but the performance degradation is
> still there.
>
Manifesting itself how?
> Looking through the OSDs to pinpoint a source of the degradation and
> hoping the current load will be lowered.
>
We manage to go through the restore, but the performance degradation is still
there.
Looking through the OSDs to pinpoint a source of the degradation and hoping the
current load will be lowered.
I’m a bit afraid of doing the 0 to weight of an OSD, wouldn’t it be tough if
the degradation is sti
FWI I did restart the OSDs until I saw a server that made impact. Until that
server stopped doing impact, I didn’t get lower in the number objects being
degraded.
After a while it was done with recovering that OSD and happily started with
others.
I guess I will be seeing the same behaviour when
Actually, it only worked with restarting for a period of time to get the
recovering process going. Can’t get passed the 21k object mark.
I’m uncertain if the disk really is messing this up right now as well. So I’m
not glad to start moving 300k objects around.
Regards,
Josef
On 06 Sep 2014, a
Hi,
On 06 Sep 2014, at 13:53, Christian Balzer wrote:
>
> Hello,
>
> On Sat, 6 Sep 2014 13:37:25 +0200 Josef Johansson wrote:
>
>> Also putting this on the list.
>>
>> On 06 Sep 2014, at 13:36, Josef Johansson wrote:
>>
>>> Hi,
>>>
>>> Same issues again, but I think we found the drive tha
Hello,
On Sat, 6 Sep 2014 13:37:25 +0200 Josef Johansson wrote:
> Also putting this on the list.
>
> On 06 Sep 2014, at 13:36, Josef Johansson wrote:
>
> > Hi,
> >
> > Same issues again, but I think we found the drive that causes the
> > problems.
> >
> > But this is causing problems as it’
Also putting this on the list.
On 06 Sep 2014, at 13:36, Josef Johansson wrote:
> Hi,
>
> Same issues again, but I think we found the drive that causes the problems.
>
> But this is causing problems as it’s trying to do a recover to that osd at
> the moment.
>
> So we’re left with the status
Only time I saw such behaviour was when I was deleting a big chunk of data
from the cluster: all the client activity was reduced, the op/s were almost
non-existent and there was unjustified delays all over the cluster. But all
the disks were somewhat busy in atop/iotstat.
On 5 September 2014 09:5
Hi,
Indeed strange.
That output was when we had issues, seems that most operations were blocked /
slow requests.
A ”baseline” output is more like today:
2014-09-05 10:44:29.123681 mon.0 [INF] pgmap v12582759: 6860 pgs: 6860
active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB avail; 92
Hello,
On Fri, 5 Sep 2014 08:26:47 +0200 David wrote:
> Hi,
>
> Sorry for the lack of information yesterday, this was "solved" after
> some 30 minutes, after having reloaded/restarted all osd daemons.
> Unfortunately we couldn’t pin point it to a single OSD or drive, all
> drives seemed ok, som
Hi,
Sorry for the lack of information yesterday, this was "solved" after some 30
minutes, after having reloaded/restarted all osd daemons.
Unfortunately we couldn’t pin point it to a single OSD or drive, all drives
seemed ok, some had a bit higher latency and we tried to out / in them to see
if
Just echoing what Christian said.
Also, iirc the "currently waiting for subobs on [" could also mean a
problem on those as it waits for ack from them (I might remember wrong).
If that is the case you might want to check in on osd 13 & 37 as well.
With the cluster load and size you should not hav
On Thu, 4 Sep 2014 12:02:13 +0200 David wrote:
> Hi,
>
> We’re running a ceph cluster with version:
>
> 0.67.7-1~bpo70+1
>
> All of a sudden we’re having issues with the cluster (running RBD images
> for kvm) with slow requests on all of the OSD servers. Any idea why and
> how to fix it?
>
You
23 matches
Mail list logo