Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
On 07 Sep 2014, at 04:47, Christian Balzer wrote: > On Sat, 6 Sep 2014 19:47:13 +0200 Josef Johansson wrote: > >> >> On 06 Sep 2014, at 19:37, Josef Johansson wrote: >> >>> Hi, >>> >>> Unfortunatly the journal tuning did not do much. That’s odd, because I >>> don’t see much utilisation on O

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Christian Balzer
On Sat, 6 Sep 2014 19:47:13 +0200 Josef Johansson wrote: > > On 06 Sep 2014, at 19:37, Josef Johansson wrote: > > > Hi, > > > > Unfortunatly the journal tuning did not do much. That’s odd, because I > > don’t see much utilisation on OSDs themselves. Now this leads to a > > network-issue betwee

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
On 06 Sep 2014, at 19:37, Josef Johansson wrote: > Hi, > > Unfortunatly the journal tuning did not do much. That’s odd, because I don’t > see much utilisation on OSDs themselves. Now this leads to a network-issue > between the OSDs right? > To answer my own question. Restarted a bond and it

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, Unfortunatly the journal tuning did not do much. That’s odd, because I don’t see much utilisation on OSDs themselves. Now this leads to a network-issue between the OSDs right? On 06 Sep 2014, at 18:17, Josef Johansson wrote: > Hi, > > On 06 Sep 2014, at 17:59, Christian Balzer wrote: >

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, On 06 Sep 2014, at 17:59, Christian Balzer wrote: > > Hello, > > On Sat, 6 Sep 2014 17:41:02 +0200 Josef Johansson wrote: > >> Hi, >> >> On 06 Sep 2014, at 17:27, Christian Balzer wrote: >> >>> >>> Hello, >>> >>> On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote: >>> We m

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, On 06 Sep 2014, at 18:05, Christian Balzer wrote: > > Hello, > > On Sat, 6 Sep 2014 17:52:59 +0200 Josef Johansson wrote: > >> Hi, >> >> Just realised that it could also be with a popularity bug as well and >> lots a small traffic. And seeing that it’s fast it gets popular until it >> hi

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Christian Balzer
Hello, On Sat, 6 Sep 2014 17:52:59 +0200 Josef Johansson wrote: > Hi, > > Just realised that it could also be with a popularity bug as well and > lots a small traffic. And seeing that it’s fast it gets popular until it > hits the curb. > I don't think I ever heard the term "popularity bug" bef

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Christian Balzer
Hello, On Sat, 6 Sep 2014 17:41:02 +0200 Josef Johansson wrote: > Hi, > > On 06 Sep 2014, at 17:27, Christian Balzer wrote: > > > > > Hello, > > > > On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote: > > > >> We manage to go through the restore, but the performance degradation > >> i

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, Just realised that it could also be with a popularity bug as well and lots a small traffic. And seeing that it’s fast it gets popular until it hits the curb. I’m seeing this in the stats I think. Linux 3.13-0.bpo.1-amd64 (osd1) 09/06/2014 _x86_64_(24 CPU) 09/06/2014 05

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, On 06 Sep 2014, at 17:27, Christian Balzer wrote: > > Hello, > > On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote: > >> We manage to go through the restore, but the performance degradation is >> still there. >> > Manifesting itself how? > Awful slow io on the VMs, and iowait, it’

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Christian Balzer
Hello, On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote: > We manage to go through the restore, but the performance degradation is > still there. > Manifesting itself how? > Looking through the OSDs to pinpoint a source of the degradation and > hoping the current load will be lowered. >

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
We manage to go through the restore, but the performance degradation is still there. Looking through the OSDs to pinpoint a source of the degradation and hoping the current load will be lowered. I’m a bit afraid of doing the 0 to weight of an OSD, wouldn’t it be tough if the degradation is sti

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
FWI I did restart the OSDs until I saw a server that made impact. Until that server stopped doing impact, I didn’t get lower in the number objects being degraded. After a while it was done with recovering that OSD and happily started with others. I guess I will be seeing the same behaviour when

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Actually, it only worked with restarting for a period of time to get the recovering process going. Can’t get passed the 21k object mark. I’m uncertain if the disk really is messing this up right now as well. So I’m not glad to start moving 300k objects around. Regards, Josef On 06 Sep 2014, a

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, On 06 Sep 2014, at 13:53, Christian Balzer wrote: > > Hello, > > On Sat, 6 Sep 2014 13:37:25 +0200 Josef Johansson wrote: > >> Also putting this on the list. >> >> On 06 Sep 2014, at 13:36, Josef Johansson wrote: >> >>> Hi, >>> >>> Same issues again, but I think we found the drive tha

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Christian Balzer
Hello, On Sat, 6 Sep 2014 13:37:25 +0200 Josef Johansson wrote: > Also putting this on the list. > > On 06 Sep 2014, at 13:36, Josef Johansson wrote: > > > Hi, > > > > Same issues again, but I think we found the drive that causes the > > problems. > > > > But this is causing problems as it’

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Also putting this on the list. On 06 Sep 2014, at 13:36, Josef Johansson wrote: > Hi, > > Same issues again, but I think we found the drive that causes the problems. > > But this is causing problems as it’s trying to do a recover to that osd at > the moment. > > So we’re left with the status

Re: [ceph-users] Huge issues with slow requests

2014-09-05 Thread Luis Periquito
Only time I saw such behaviour was when I was deleting a big chunk of data from the cluster: all the client activity was reduced, the op/s were almost non-existent and there was unjustified delays all over the cluster. But all the disks were somewhat busy in atop/iotstat. On 5 September 2014 09:5

Re: [ceph-users] Huge issues with slow requests

2014-09-05 Thread David
Hi, Indeed strange. That output was when we had issues, seems that most operations were blocked / slow requests. A ”baseline” output is more like today: 2014-09-05 10:44:29.123681 mon.0 [INF] pgmap v12582759: 6860 pgs: 6860 active+clean; 12253 GB data, 36574 GB used, 142 TB / 178 TB avail; 92

Re: [ceph-users] Huge issues with slow requests

2014-09-05 Thread Christian Balzer
Hello, On Fri, 5 Sep 2014 08:26:47 +0200 David wrote: > Hi, > > Sorry for the lack of information yesterday, this was "solved" after > some 30 minutes, after having reloaded/restarted all osd daemons. > Unfortunately we couldn’t pin point it to a single OSD or drive, all > drives seemed ok, som

Re: [ceph-users] Huge issues with slow requests

2014-09-04 Thread David
Hi, Sorry for the lack of information yesterday, this was "solved" after some 30 minutes, after having reloaded/restarted all osd daemons. Unfortunately we couldn’t pin point it to a single OSD or drive, all drives seemed ok, some had a bit higher latency and we tried to out / in them to see if

Re: [ceph-users] Huge issues with slow requests

2014-09-04 Thread Martin B Nielsen
Just echoing what Christian said. Also, iirc the "currently waiting for subobs on [" could also mean a problem on those as it waits for ack from them (I might remember wrong). If that is the case you might want to check in on osd 13 & 37 as well. With the cluster load and size you should not hav

Re: [ceph-users] Huge issues with slow requests

2014-09-04 Thread Christian Balzer
On Thu, 4 Sep 2014 12:02:13 +0200 David wrote: > Hi, > > We’re running a ceph cluster with version: > > 0.67.7-1~bpo70+1 > > All of a sudden we’re having issues with the cluster (running RBD images > for kvm) with slow requests on all of the OSD servers. Any idea why and > how to fix it? > You