On Sun, Jun 22, 2014 at 6:44 AM, Mark Nelson
wrote:
> RBD Cache is definitely going to help in this use case. This test is
> basically just sequentially writing a single 16k chunk of data out, one at
> a time. IE, entirely latency bound. At least on OSDs backed by XFS, you
> have to wait for t
00KB, aggrb=9264KB/s, minb=9264KB/s, maxb=9264KB/s,
mint=44213msec, maxt=44213msec
Disk stats (read/write):
rbd2: ios=0/102499, merge=0/1818, ticks=0/5593828, in_queue=5599520,
util=99.85%
On Sun, Jun 22, 2014 at 6:42 PM, Christian Balzer wrote:
> On Sun, 22 Jun 2014 12:14:38 -0700 Greg Po
How does RBD cache work? I wasn't able to find an adequate explanation in
the docs.
On Sunday, June 22, 2014, Mark Kirkwood
wrote:
> Good point, I had neglected to do that.
>
> So, amending my conf.conf [1]:
>
> [client]
> rbd cache = true
> rbd cache size = 2147483648
> rbd cache max dirty = 10
We actually do have a use pattern of large batch sequential writes, and
this dd is pretty similar to that use case.
A round-trip write with replication takes approximately 10-15ms to
complete. I've been looking at dump_historic_ops on a number of OSDs and
getting mean, min, and max for sub_op and
ournal...ssd able to do
> 180 MB/s etc), however I am still seeing writes to the spinners during the
> 8s or so that the above dd tests take).
> [2] Ubuntu 13.10 VM - I'll upgrade it to 14.04 and see if that helps at
> all.
>
>
> On 21/06/14 09:17, Greg Poirier wrote:
>
2048
> filestore_xattr_use_omap = true
> osd_pool_default_size = 2
> osd_pool_default_min_size = 1
> osd_pool_default_pg_num = 1024
> public_network = 192.168.0.0/24
> osd_mkfs_type = xfs
> cluster_network = 192.168.1.0/24
>
>
>
> On Fri, Jun 20, 2014 at 11:08 AM, Greg Poirier
>
I recently created a 9-node Firefly cluster backed by all SSDs. We have had
some pretty severe performance degradation when using O_DIRECT in our tests
(as this is how MySQL will be interacting with RBD volumes, this makes the
most sense for a preliminary test). Running the following test:
dd if=/
On Saturday, April 19, 2014, Mike Dawson wrote:
>
>
> With a workload consisting of lots of small writes, I've seen client IO
> starved with as little as 5Mbps of traffic per host due to spindle
> contention once deep-scrub and/or recovery/backfill start. Co-locating OSD
> Journals on the same spi
We have a cluster in a sub-optimal configuration with data and journal
colocated on OSDs (that coincidentally are spinning disks).
During recovery/backfill, the entire cluster suffers degraded performance
because of the IO storm that backfills cause. Client IO becomes extremely
latent. I've tried
Villalta [
> ja...@rubixnet.com]
> *Sent:* 12 April 2014 16:41
> *To:* Greg Poirier
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Useful visualizations / metrics
>
> I know ceph throws some warnings if there is high write latency. But i
> would be mo
sure there is a specific metric
>> in ceph for this but it would be awesome if there was.
>>
>>
>> On Sat, Apr 12, 2014 at 10:37 AM, Greg Poirier > <mailto:greg.poir...@opower.com>> wrote:
>>
>> Curious as to how you define cluster latency.
>&
cents.
>
>
> On Sat, Apr 12, 2014 at 10:02 AM, Greg Poirier wrote:
>
>> I'm in the process of building a dashboard for our Ceph nodes. I was
>> wondering if anyone out there had instrumented their OSD / MON clusters and
>> found particularly useful visualization
I'm in the process of building a dashboard for our Ceph nodes. I was
wondering if anyone out there had instrumented their OSD / MON clusters and
found particularly useful visualizations.
At first, I was trying to do ridiculous things (like graphing % used for
every disk in every OSD host), but I r
2 active+remapped+backfill_toofull
1 active+degraded+remapped+backfilling
recovery io 362 MB/s, 365 objects/s
client io 1643 kB/s rd, 6001 kB/s wr, 911 op/s
On Fri, Apr 11, 2014 at 5:45 AM, Greg Poirier wrote:
> So... our storage problems persisted for about 45 minutes. I gave
ef Johansson wrote:
>
>>
>> On 11/04/14 09:07, Wido den Hollander wrote:
>>
>>>
>>> Op 11 april 2014 om 8:50 schreef Josef Johansson :
>>>>
>>>>
>>>> Hi,
>>>>
>>>> On 11/04/14 07:29, Wido den Holl
One thing to note
All of our kvm VMs have to be rebooted. This is something I wasn't
expecting. Tried waiting for them to recover on their own, but that's not
happening. Rebooting them restores service immediately. :/ Not ideal.
On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier wrote
number of OSDs), but got held up by some networking nonsense.
Thanks for the tips.
On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil wrote:
> On Thu, 10 Apr 2014, Greg Poirier wrote:
> > Hi,
> > I have about 200 VMs with a common RBD volume as their root filesystem
> and a
> &
Hi,
I have about 200 VMs with a common RBD volume as their root filesystem and
a number of additional filesystems on Ceph.
All of them have stopped responding. One of the OSDs in my cluster is
marked full. I tried stopping that OSD to force things to rebalance or at
least go to degraded mode, but
don't see any smart errors, but i'm slowly working my way through all of
the disks on these machines with smartctl to see if anything stands out.
On Fri, Mar 14, 2014 at 9:52 AM, Gregory Farnum wrote:
> On Fri, Mar 14, 2014 at 9:37 AM, Greg Poirier
> wrote:
> > So,
ocking progress. If it is the journal commit, check out how busy the
> disk is (is it just saturated?) and what its normal performance
> characteristics are (is it breaking?).
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Thu, Mar 13, 2014 at 5:48 PM, Gr
uot;0.086852",
{ "time": "2014-03-13 20:41:40.314633",
"event": "commit_sent"},
{ "time": "2014-03-13 20:41:40.314665",
"event":
the commitment to figuring this poo out.
On Wed, Mar 12, 2014 at 8:31 PM, Greg Poirier wrote:
> Increasing the logging further, and I notice the following:
>
> 2014-03-13 00:27:28.617100 7f6036ffd700 20 rgw_create_bucket returned
> ret=-1 bucket=test(@.rgw.buckets[us-west-1.15849318
g?
I did notice that .us-west-1.rgw.buckets and .us-west-1.rgw.buckets.index
weren't created. I created those, restarted radosgw, and still 403 errors.
On Wed, Mar 12, 2014 at 8:00 PM, Greg Poirier wrote:
> And the debug log because that last log was obviously not helpful...
>
&g
in.rgw+.pools.avail to cache LRU end
2014-03-12 23:57:49.522672 7ff97e7dd700 2 req 1:0.024893:s3:PUT
/test:create_bucket:http status=403
2014-03-12 23:57:49.523204 7ff97e7dd700 1 == req done req=0x23bc650
http_status=403 ==
On Wed, Mar 12, 2014 at 7:36 PM, Greg Poirier wrote:
> The saga
tool?
On Wed, Mar 12, 2014 at 1:54 PM, Greg Poirier wrote:
> Also... what are linger_ops?
>
> ceph --admin-daemon /var/run/ceph/ceph-client.radosgw..asok
> objecter_requests
> { "ops": [],
> "linger_ops": [
> { "linger_id":
"pg": "7.31099063",
"osd": 28,
"object_id": "notify.5",
"object_locator": "@7",
"snapid": "head",
"registering": "head",
Rados GW and Ceph versions installed:
Version: 0.67.7-1precise
I create a user:
radosgw-admin --name client.radosgw. user create --uid test
--display-name "Test User"
It outputs some JSON that looks convincing:
{ "user_id": "test",
"display_name": "test user",
"email": "",
"suspended": 0,
appy if you pull the power plug on it.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Fri, Feb 28, 2014 at 2:12 PM, Greg Poirier
>> wrote:
>>
>>> According to the documentation at
>>> https://ceph.c
According to the documentation at
https://ceph.com/docs/master/rbd/rbd-snapshot/ -- snapshots require that
all I/O to a block device be stopped prior to making the snapshot. Is there
any plan to allow for online snapshotting so that we could do incremental
snapshots of running VMs on a regular basi
emoving the infected monitor node
> and adding it back to cluster.
>
>
> Regards
>
> Karan
>
> --
> *From: *"Greg Poirier"
> *To: *ceph-users@lists.ceph.com
> *Sent: *Tuesday, 4 February, 2014 10:50:21 PM
> *Subject: *[ceph-users] Ceph MON
I have a MON that at some point lost connectivity to the rest of the
cluster and now cannot rejoin.
Each time I restart it, it looks like it's attempting to create a new MON
and join the cluster, but the rest of the cluster rejects it, because the
new one isn't in the monmap.
I don't know why it
On Fri, Jan 24, 2014 at 4:28 PM, Yehuda Sadeh wrote:
> For each object that rgw stores it keeps a version tag. However this
> version is not ascending, it's just used for identifying whether an
> object has changed. I'm not completely sure what is the problem that
> you're trying to solve though.
Hello!
I have a great deal of interest in the ability to version objects in
buckets via the S3 API. Where is this on the roadmap for Ceph?
This is a pretty useful feature during failover scenarios between zones in
a region. For instance, take the example where you have a region with two
zones:
u
On Sun, Dec 8, 2013 at 8:33 PM, Mark Kirkwood wrote:
>
> I'd suggest testing the components separately - try to rule out NIC (and
> switch) issues and SSD performance issues, then when you are sure the bits
> all go fast individually test how ceph performs again.
>
> What make and model of SSD? I'
Hi.
So, I have a test cluster made up of ludicrously overpowered machines with
nothing but SSDs in them. Bonded 10Gbps NICs (802.3ad layer 2+3 xmit hash
policy, confirmed ~19.8 Gbps throughput with 32+ threads). I'm running
rados bench, and I am currently getting less than 1 MBps throughput:
sudo
Gregs are awesome, apparently. Thanks for the confirmation.
I know that threads are light-weight, it's just the first time I've ever
run into something that uses them... so liberally. ^_^
On Mon, Aug 26, 2013 at 10:07 AM, Gregory Farnum wrote:
> On Mon, Aug 26, 2013 at 9:24 AM,
So, in doing some testing last week, I believe I managed to exhaust the
number of threads available to nova-compute last week. After some
investigation, I found the pthread_create failure and increased nproc for
our Nova user to, what I considered, a ridiculous 120,000 threads after
reading that li
On Fri, Aug 23, 2013 at 9:53 AM, Gregory Farnum wrote:
>
> Okay. It's important to realize that because Ceph distributes data
> pseudorandomly, each OSD is going to end up with about the same amount
> of data going to it. If one of your drives is slower than the others,
> the fast ones can get ba
Ah thanks, Brian. I will do that. I was going off the wiki instructions on
performing rados benchmarks. If I have the time later, I will change it
there.
On Fri, Aug 23, 2013 at 9:37 AM, Brian Andrus wrote:
> Hi Greg,
>
>
>> I haven't had any luck with the seq bench. It just errors every time.
>
On Thu, Aug 22, 2013 at 2:34 PM, Gregory Farnum wrote:
> You don't appear to have accounted for the 2x replication (where all
> writes go to two OSDs) in these calculations. I assume your pools have
>
Ah. Right. So I should then be looking at:
# OSDs * Throughput per disk / 2 / repl factor ?
Cuttlefish. If you encounter those, delete your
> journal and re-create with `ceph-osd -i --mkjournal'. Your
> data-store will be OK, as far as I can tell.
>
>
>Regards,
>
> Oliver
>
> On do, 2013-08-22 at 10:55 -0700, Greg Poirier wrote:
> > I ha
I have been benchmarking our Ceph installation for the last week or so, and
I've come across an issue that I'm having some difficulty with.
Ceph bench reports reasonable write throughput at the OSD level:
ceph tell osd.0 bench
{ "bytes_written": 1073741824,
"blocksize": 4194304,
"bytes_per_se
On Wed, Jul 31, 2013 at 12:19 PM, Mike Dawson wrote:
> Due to the speed of releases in the Ceph project, I feel having separate
> physical hardware is the safer way to go, especially in light of your
> mention of an SLA for your production services.
>
Ah. I guess I should offer a little more back
Does anyone here have multiple clusters or segment their single cluster in
such a way as to try to maintain different SLAs for production vs
non-production services?
We have been toying with the idea of running separate clusters (on the same
hardware, but reserve a portion of the OSDs for the prod
44 matches
Mail list logo