Re: [ceph-users] Poor performance on all SSD cluster

2014-06-23 Thread Greg Poirier
On Sun, Jun 22, 2014 at 6:44 AM, Mark Nelson wrote: > RBD Cache is definitely going to help in this use case. This test is > basically just sequentially writing a single 16k chunk of data out, one at > a time. IE, entirely latency bound. At least on OSDs backed by XFS, you > have to wait for t

Re: [ceph-users] Poor performance on all SSD cluster

2014-06-23 Thread Greg Poirier
00KB, aggrb=9264KB/s, minb=9264KB/s, maxb=9264KB/s, mint=44213msec, maxt=44213msec Disk stats (read/write): rbd2: ios=0/102499, merge=0/1818, ticks=0/5593828, in_queue=5599520, util=99.85% On Sun, Jun 22, 2014 at 6:42 PM, Christian Balzer wrote: > On Sun, 22 Jun 2014 12:14:38 -0700 Greg Po

Re: [ceph-users] Poor performance on all SSD cluster

2014-06-22 Thread Greg Poirier
How does RBD cache work? I wasn't able to find an adequate explanation in the docs. On Sunday, June 22, 2014, Mark Kirkwood wrote: > Good point, I had neglected to do that. > > So, amending my conf.conf [1]: > > [client] > rbd cache = true > rbd cache size = 2147483648 > rbd cache max dirty = 10

Re: [ceph-users] Poor performance on all SSD cluster

2014-06-22 Thread Greg Poirier
We actually do have a use pattern of large batch sequential writes, and this dd is pretty similar to that use case. A round-trip write with replication takes approximately 10-15ms to complete. I've been looking at dump_historic_ops on a number of OSDs and getting mean, min, and max for sub_op and

Re: [ceph-users] Poor performance on all SSD cluster

2014-06-22 Thread Greg Poirier
ournal...ssd able to do > 180 MB/s etc), however I am still seeing writes to the spinners during the > 8s or so that the above dd tests take). > [2] Ubuntu 13.10 VM - I'll upgrade it to 14.04 and see if that helps at > all. > > > On 21/06/14 09:17, Greg Poirier wrote: >

Re: [ceph-users] Poor performance on all SSD cluster

2014-06-20 Thread Greg Poirier
2048 > filestore_xattr_use_omap = true > osd_pool_default_size = 2 > osd_pool_default_min_size = 1 > osd_pool_default_pg_num = 1024 > public_network = 192.168.0.0/24 > osd_mkfs_type = xfs > cluster_network = 192.168.1.0/24 > > > > On Fri, Jun 20, 2014 at 11:08 AM, Greg Poirier >

[ceph-users] Poor performance on all SSD cluster

2014-06-20 Thread Greg Poirier
I recently created a 9-node Firefly cluster backed by all SSDs. We have had some pretty severe performance degradation when using O_DIRECT in our tests (as this is how MySQL will be interacting with RBD volumes, this makes the most sense for a preliminary test). Running the following test: dd if=/

Re: [ceph-users] Backfill and Recovery traffic shaping

2014-04-19 Thread Greg Poirier
On Saturday, April 19, 2014, Mike Dawson wrote: > > > With a workload consisting of lots of small writes, I've seen client IO > starved with as little as 5Mbps of traffic per host due to spindle > contention once deep-scrub and/or recovery/backfill start. Co-locating OSD > Journals on the same spi

[ceph-users] Backfill and Recovery traffic shaping

2014-04-19 Thread Greg Poirier
We have a cluster in a sub-optimal configuration with data and journal colocated on OSDs (that coincidentally are spinning disks). During recovery/backfill, the entire cluster suffers degraded performance because of the IO storm that backfills cause. Client IO becomes extremely latent. I've tried

Re: [ceph-users] Useful visualizations / metrics

2014-04-13 Thread Greg Poirier
Villalta [ > ja...@rubixnet.com] > *Sent:* 12 April 2014 16:41 > *To:* Greg Poirier > *Cc:* ceph-users@lists.ceph.com > *Subject:* Re: [ceph-users] Useful visualizations / metrics > > I know ceph throws some warnings if there is high write latency. But i > would be mo

Re: [ceph-users] Useful visualizations / metrics

2014-04-12 Thread Greg Poirier
sure there is a specific metric >> in ceph for this but it would be awesome if there was. >> >> >> On Sat, Apr 12, 2014 at 10:37 AM, Greg Poirier > <mailto:greg.poir...@opower.com>> wrote: >> >> Curious as to how you define cluster latency. >&

Re: [ceph-users] Useful visualizations / metrics

2014-04-12 Thread Greg Poirier
cents. > > > On Sat, Apr 12, 2014 at 10:02 AM, Greg Poirier wrote: > >> I'm in the process of building a dashboard for our Ceph nodes. I was >> wondering if anyone out there had instrumented their OSD / MON clusters and >> found particularly useful visualization

[ceph-users] Useful visualizations / metrics

2014-04-12 Thread Greg Poirier
I'm in the process of building a dashboard for our Ceph nodes. I was wondering if anyone out there had instrumented their OSD / MON clusters and found particularly useful visualizations. At first, I was trying to do ridiculous things (like graphing % used for every disk in every OSD host), but I r

Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-11 Thread Greg Poirier
2 active+remapped+backfill_toofull 1 active+degraded+remapped+backfilling recovery io 362 MB/s, 365 objects/s client io 1643 kB/s rd, 6001 kB/s wr, 911 op/s On Fri, Apr 11, 2014 at 5:45 AM, Greg Poirier wrote: > So... our storage problems persisted for about 45 minutes. I gave

Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-11 Thread Greg Poirier
ef Johansson wrote: > >> >> On 11/04/14 09:07, Wido den Hollander wrote: >> >>> >>> Op 11 april 2014 om 8:50 schreef Josef Johansson : >>>> >>>> >>>> Hi, >>>> >>>> On 11/04/14 07:29, Wido den Holl

Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-10 Thread Greg Poirier
One thing to note All of our kvm VMs have to be rebooted. This is something I wasn't expecting. Tried waiting for them to recover on their own, but that's not happening. Rebooting them restores service immediately. :/ Not ideal. On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier wrote

Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-10 Thread Greg Poirier
number of OSDs), but got held up by some networking nonsense. Thanks for the tips. On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil wrote: > On Thu, 10 Apr 2014, Greg Poirier wrote: > > Hi, > > I have about 200 VMs with a common RBD volume as their root filesystem > and a > &

[ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-10 Thread Greg Poirier
Hi, I have about 200 VMs with a common RBD volume as their root filesystem and a number of additional filesystems on Ceph. All of them have stopped responding. One of the OSDs in my cluster is marked full. I tried stopping that OSD to force things to rebalance or at least go to degraded mode, but

Re: [ceph-users] Replication lag in block storage

2014-03-14 Thread Greg Poirier
don't see any smart errors, but i'm slowly working my way through all of the disks on these machines with smartctl to see if anything stands out. On Fri, Mar 14, 2014 at 9:52 AM, Gregory Farnum wrote: > On Fri, Mar 14, 2014 at 9:37 AM, Greg Poirier > wrote: > > So,

Re: [ceph-users] Replication lag in block storage

2014-03-14 Thread Greg Poirier
ocking progress. If it is the journal commit, check out how busy the > disk is (is it just saturated?) and what its normal performance > characteristics are (is it breaking?). > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Thu, Mar 13, 2014 at 5:48 PM, Gr

Re: [ceph-users] Replication lag in block storage

2014-03-13 Thread Greg Poirier
uot;0.086852", { "time": "2014-03-13 20:41:40.314633", "event": "commit_sent"}, { "time": "2014-03-13 20:41:40.314665", "event":

Re: [ceph-users] "no user info saved" after user creation / can't create buckets

2014-03-12 Thread Greg Poirier
the commitment to figuring this poo out. On Wed, Mar 12, 2014 at 8:31 PM, Greg Poirier wrote: > Increasing the logging further, and I notice the following: > > 2014-03-13 00:27:28.617100 7f6036ffd700 20 rgw_create_bucket returned > ret=-1 bucket=test(@.rgw.buckets[us-west-1.15849318

Re: [ceph-users] "no user info saved" after user creation / can't create buckets

2014-03-12 Thread Greg Poirier
g? I did notice that .us-west-1.rgw.buckets and .us-west-1.rgw.buckets.index weren't created. I created those, restarted radosgw, and still 403 errors. On Wed, Mar 12, 2014 at 8:00 PM, Greg Poirier wrote: > And the debug log because that last log was obviously not helpful... > &g

Re: [ceph-users] "no user info saved" after user creation / can't create buckets

2014-03-12 Thread Greg Poirier
in.rgw+.pools.avail to cache LRU end 2014-03-12 23:57:49.522672 7ff97e7dd700 2 req 1:0.024893:s3:PUT /test:create_bucket:http status=403 2014-03-12 23:57:49.523204 7ff97e7dd700 1 == req done req=0x23bc650 http_status=403 == On Wed, Mar 12, 2014 at 7:36 PM, Greg Poirier wrote: > The saga

Re: [ceph-users] "no user info saved" after user creation / can't create buckets

2014-03-12 Thread Greg Poirier
tool? On Wed, Mar 12, 2014 at 1:54 PM, Greg Poirier wrote: > Also... what are linger_ops? > > ceph --admin-daemon /var/run/ceph/ceph-client.radosgw..asok > objecter_requests > { "ops": [], > "linger_ops": [ > { "linger_id":

Re: [ceph-users] "no user info saved" after user creation / can't create buckets

2014-03-12 Thread Greg Poirier
"pg": "7.31099063", "osd": 28, "object_id": "notify.5", "object_locator": "@7", "snapid": "head", "registering": "head",

[ceph-users] "no user info saved" after user creation / can't create buckets

2014-03-12 Thread Greg Poirier
Rados GW and Ceph versions installed: Version: 0.67.7-1precise I create a user: radosgw-admin --name client.radosgw. user create --uid test --display-name "Test User" It outputs some JSON that looks convincing: { "user_id": "test", "display_name": "test user", "email": "", "suspended": 0,

Re: [ceph-users] RBD Snapshots

2014-03-03 Thread Greg Poirier
appy if you pull the power plug on it. >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Fri, Feb 28, 2014 at 2:12 PM, Greg Poirier >> wrote: >> >>> According to the documentation at >>> https://ceph.c

[ceph-users] RBD Snapshots

2014-02-28 Thread Greg Poirier
According to the documentation at https://ceph.com/docs/master/rbd/rbd-snapshot/ -- snapshots require that all I/O to a block device be stopped prior to making the snapshot. Is there any plan to allow for online snapshotting so that we could do incremental snapshots of running VMs on a regular basi

Re: [ceph-users] Ceph MON can no longer join quorum

2014-02-05 Thread Greg Poirier
emoving the infected monitor node > and adding it back to cluster. > > > Regards > > Karan > > -- > *From: *"Greg Poirier" > *To: *ceph-users@lists.ceph.com > *Sent: *Tuesday, 4 February, 2014 10:50:21 PM > *Subject: *[ceph-users] Ceph MON

[ceph-users] Ceph MON can no longer join quorum

2014-02-04 Thread Greg Poirier
I have a MON that at some point lost connectivity to the rest of the cluster and now cannot rejoin. Each time I restart it, it looks like it's attempting to create a new MON and join the cluster, but the rest of the cluster rejects it, because the new one isn't in the monmap. I don't know why it

Re: [ceph-users] RadosGW S3 API - Bucket Versions

2014-01-24 Thread Greg Poirier
On Fri, Jan 24, 2014 at 4:28 PM, Yehuda Sadeh wrote: > For each object that rgw stores it keeps a version tag. However this > version is not ascending, it's just used for identifying whether an > object has changed. I'm not completely sure what is the problem that > you're trying to solve though.

[ceph-users] RadosGW S3 API - Bucket Versions

2014-01-23 Thread Greg Poirier
Hello! I have a great deal of interest in the ability to version objects in buckets via the S3 API. Where is this on the roadmap for Ceph? This is a pretty useful feature during failover scenarios between zones in a region. For instance, take the example where you have a region with two zones: u

Re: [ceph-users] 1MB/s throughput to 33-ssd test cluster

2013-12-09 Thread Greg Poirier
On Sun, Dec 8, 2013 at 8:33 PM, Mark Kirkwood wrote: > > I'd suggest testing the components separately - try to rule out NIC (and > switch) issues and SSD performance issues, then when you are sure the bits > all go fast individually test how ceph performs again. > > What make and model of SSD? I'

[ceph-users] 1MB/s throughput to 33-ssd test cluster

2013-12-08 Thread Greg Poirier
Hi. So, I have a test cluster made up of ludicrously overpowered machines with nothing but SSDs in them. Bonded 10Gbps NICs (802.3ad layer 2+3 xmit hash policy, confirmed ~19.8 Gbps throughput with 32+ threads). I'm running rados bench, and I am currently getting less than 1 MBps throughput: sudo

Re: [ceph-users] librados pthread_create failure

2013-08-26 Thread Greg Poirier
Gregs are awesome, apparently. Thanks for the confirmation. I know that threads are light-weight, it's just the first time I've ever run into something that uses them... so liberally. ^_^ On Mon, Aug 26, 2013 at 10:07 AM, Gregory Farnum wrote: > On Mon, Aug 26, 2013 at 9:24 AM,

[ceph-users] librados pthread_create failure

2013-08-26 Thread Greg Poirier
So, in doing some testing last week, I believe I managed to exhaust the number of threads available to nova-compute last week. After some investigation, I found the pthread_create failure and increased nproc for our Nova user to, what I considered, a ridiculous 120,000 threads after reading that li

Re: [ceph-users] Unexpectedly slow write performance (RBD cinder volumes)

2013-08-23 Thread Greg Poirier
On Fri, Aug 23, 2013 at 9:53 AM, Gregory Farnum wrote: > > Okay. It's important to realize that because Ceph distributes data > pseudorandomly, each OSD is going to end up with about the same amount > of data going to it. If one of your drives is slower than the others, > the fast ones can get ba

Re: [ceph-users] Unexpectedly slow write performance (RBD cinder volumes)

2013-08-23 Thread Greg Poirier
Ah thanks, Brian. I will do that. I was going off the wiki instructions on performing rados benchmarks. If I have the time later, I will change it there. On Fri, Aug 23, 2013 at 9:37 AM, Brian Andrus wrote: > Hi Greg, > > >> I haven't had any luck with the seq bench. It just errors every time. >

Re: [ceph-users] Unexpectedly slow write performance (RBD cinder volumes)

2013-08-22 Thread Greg Poirier
On Thu, Aug 22, 2013 at 2:34 PM, Gregory Farnum wrote: > You don't appear to have accounted for the 2x replication (where all > writes go to two OSDs) in these calculations. I assume your pools have > Ah. Right. So I should then be looking at: # OSDs * Throughput per disk / 2 / repl factor ?

Re: [ceph-users] Unexpectedly slow write performance (RBD cinder volumes)

2013-08-22 Thread Greg Poirier
Cuttlefish. If you encounter those, delete your > journal and re-create with `ceph-osd -i --mkjournal'. Your > data-store will be OK, as far as I can tell. > > >Regards, > > Oliver > > On do, 2013-08-22 at 10:55 -0700, Greg Poirier wrote: > > I ha

[ceph-users] Unexpectedly slow write performance (RBD cinder volumes)

2013-08-22 Thread Greg Poirier
I have been benchmarking our Ceph installation for the last week or so, and I've come across an issue that I'm having some difficulty with. Ceph bench reports reasonable write throughput at the OSD level: ceph tell osd.0 bench { "bytes_written": 1073741824, "blocksize": 4194304, "bytes_per_se

Re: [ceph-users] Production/Non-production segmentation

2013-07-31 Thread Greg Poirier
On Wed, Jul 31, 2013 at 12:19 PM, Mike Dawson wrote: > Due to the speed of releases in the Ceph project, I feel having separate > physical hardware is the safer way to go, especially in light of your > mention of an SLA for your production services. > Ah. I guess I should offer a little more back

[ceph-users] Production/Non-production segmentation

2013-07-31 Thread Greg Poirier
Does anyone here have multiple clusters or segment their single cluster in such a way as to try to maintain different SLAs for production vs non-production services? We have been toying with the idea of running separate clusters (on the same hardware, but reserve a portion of the OSDs for the prod