Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-23 Thread Andrew Cowie
On Mon, 2014-12-22 at 15:26 -0800, Craig Lewis wrote: > My problems were memory pressure plus an XFS bug, so it took a while > to manifest. The following (long, ongoing) thread on linux-mm discusses our [severe] problems with memory pressure taking out entire OSD servers. The upstream problems

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-23 Thread Sean Sullivan
I am trying to understand these drive throttle markers that were mentioned to get an idea of why these drives are marked as slow.:: here is the iostat of the drive /dev/sdbm http://paste.ubuntu.com/9607168/ an IO wait of .79 doesn't seem bad but a write wait of 21.52 seems really high. Looking

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-22 Thread Sean Sullivan
Awesome! I have yet to hear of any zfs in ceph chat nor have I seen it on the mailing lists that I have caught. I would assume it would function pretty well considering how long it has been in use along some production systems I have seen. I have little to no experience with it personally though.

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-22 Thread Craig Lewis
On Mon, Dec 22, 2014 at 2:57 PM, Sean Sullivan wrote: > Thanks Craig! > > I think that this may very well be my issue with osds dropping out but I > am still not certain as I had the cluster up for a small period while > running rados bench for a few days without any status changes. > Mine were

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-22 Thread Sean Sullivan
Hello Christian, Sorry for the long wait. Actually I have done a rados bench earlier on in the cluster without any failure but it did take a while. That and there is actually a lot of data being downloaded to the cluster now. Here are the rados results for 100 seconds:: http://pastebin.com/q5E6Jjk

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-19 Thread Christian Balzer
Hello Sean, On Fri, 19 Dec 2014 02:47:41 -0600 Sean Sullivan wrote: > Hello Christian, > > Thanks again for all of your help! I started a bonnie test using the > following:: > bonnie -d /mnt/rbd/scratch2/ -m $(hostname) -f -b > While that gives you a decent idea of what the limitations of ker

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-19 Thread Gregory Farnum
On Thu, Dec 18, 2014 at 8:44 PM, Sean Sullivan wrote: > Thanks for the reply Gegory, > > Sorry if this is in the wrong direction or something. Maybe I do not > understand > > To test uploads I either use bash time and either python-swiftclient or boto > key.set_contents_from_filename to the radosg

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-19 Thread Sean Sullivan
Hello Christian, Thanks again for all of your help! I started a bonnie test using the following:: bonnie -d /mnt/rbd/scratch2/ -m $(hostname) -f -b Hopefully it completes in the next hour or so. A reboot of the slow OSDs clears the slow marker for now kh10-9$ ceph -w cluster 9ea4d9d9-0

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-18 Thread Christian Balzer
Hello, On Thu, 18 Dec 2014 23:45:57 -0600 Sean Sullivan wrote: > Wow Christian, > > Sorry I missed these in line replies. Give me a minute to gather some > data. Thanks a million for the in depth responses! > No worries. > I thought about raiding it but I needed the space unfortunately. I had

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-18 Thread Sean Sullivan
Wow Christian, Sorry I missed these in line replies. Give me a minute to gather some data. Thanks a million for the in depth responses! I thought about raiding it but I needed the space unfortunately. I had a 3x60 osd node test cluster that we tried before this and it didn't have this floppi

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-18 Thread Sean Sullivan
thanks! It would be really great in the right hands. Through some stroke of luck it's in mine. The flapping osd is becoming a real issue at this point as it is the only possible lead I have to why the gateways are transferring so slowly. The weird issue is that I can have 8 or 60 transfers goin

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-18 Thread Christian Balzer
Hello, Nice cluster, I wouldn't mind getting my hand or her ample nacelles, er, wrong movie. ^o^ On Thu, 18 Dec 2014 21:35:36 -0600 Sean Sullivan wrote: > Hello Yall! > > I can't figure out why my gateways are performing so poorly and I am not > sure where to start looking. My RBD mounts seem

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-18 Thread Sean Sullivan
Thanks for the reply Gegory, Sorry if this is in the wrong direction or something. Maybe I do not understand To test uploads I either use bash time and either python-swiftclient or boto key.set_contents_from_filename to the radosgw. I was unaware that radosgw had any type of throttle settings in

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-18 Thread Gregory Farnum
What kind of uploads are you performing? How are you testing? Have you looked at the admin sockets on any daemons yet? Examining the OSDs to see if they're behaving differently on the different requests is one angle of attack. The other is look into is if the RGW daemons are hitting throttler limit

[ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-18 Thread Sean Sullivan
Hello Yall! I can't figure out why my gateways are performing so poorly and I am not sure where to start looking. My RBD mounts seem to be performing fine (over 300 MB/s) while uploading a 5G file to Swift/S3 takes 2m32s (32MBps i believe). If we try a 1G file it's closer to 8MBps. Testing with nu