> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Huan Zhang
> Sent: 26 February 2016 06:50
> To: Jason Dillaman
> Cc: josh durgin ; Nick Fisk ;
> ceph-users
> Subject: Re: [ceph-users] Guest sync write iops so poor.
>
> rbd engine with fsy
Hello,
still my test cluster with 0.94.6.
It's a bit fuzzy, but I don't think I saw this with Firefly, but then
again that is totally broken when it comes to cache tiers (switching
between writeback and forward mode).
goat is a cache pool for rbd:
---
# ceph osd pool ls detail
pool 2 'rbd' repl
Hi Nick,
DB's IO pattern depends on config, mysql for example.
innodb_flush_log_at_trx_commit =1, mysql will sync after one transcation.
like:
write
sync
wirte
sync
...
innodb_flush_log_at_trx_commit = 5,
write
write
write
write
write
sync
innodb_flush_log_at_trx_commit = 0,
write
write
...
one s
Hi Christian,
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Christian Balzer
> Sent: 26 February 2016 09:07
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Cache tier weirdness
>
>
> Hello,
>
> still my test cluster with 0.94.6
fio /dev/rbd0 sync=1 has no problem.
Doesn't find 'sync cache code' in linux rbd block driver and radosgw api.
Seems sync cache is just the concept of librbd (for rbd cache).
Just my concerns.
2016-02-26 17:30 GMT+08:00 Huan Zhang :
> Hi Nick,
> DB's IO pattern depends on config, mysql for exampl
I guess my question was more around what does your final workload look like, if
it’s the same as the SQL benchmarks then you are not going to get much better
performance than what you do now, aside from trying some of the tuning options
I mentioned which might get you an extra 100iops.
The only
O_DIRECT is _not_ a flag for synchronous blocking IO.
O_DIRECT only hints the kernel that it needs not cache/buffer the data.
The kernel is actually free to buffer and cache it and it does buffer it.
It also does _not_ flush O_DIRECT writes to disk but it makes best effort to
send it to the drives
Also take a look at Galera cluster. You can relax flushing to disk as long as
all your nodes don't go down at the same time.
(And when a node goes back up after a crash you should trash it before it
rejoins the cluster)
Jan
> On 26 Feb 2016, at 11:01, Nick Fisk wrote:
>
> I guess my question
Thanks Jan, that is an excellent explanation.
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan
Schermer
Sent: 26 February 2016 10:07
To: Huan Zhang
Cc: josh durgin ; Nick Fisk ;
ceph-users
Subject: Re: [ceph-users] Guest sync write iops so poor.
O_DIRECT is _no
Christian,
> Note that "rand" works fine, as does "seq" on a 0.95.5 cluster.
Could you please check if 0.94.5 ("old") *client* works with 0.94.6
("new") servers, and vice a versa?
Best regards,
Alexey
On Fri, Feb 26, 2016 at 9:44 AM, Christian Balzer wrote:
>
> Hello,
>
> On my crappy te
I can reproduce and updated the ticket. (I only upgraded the client,
not the server).
It seems to be related to the new --no-verify option, which is giving
strange results -- see the ticket.
-- Dan
On Fri, Feb 26, 2016 at 11:48 AM, Alexey Sheplyakov
wrote:
> Christian,
>
>> Note that "rand" wo
Alexander,
> # ceph osd pool get-quota cache
> quotas for pool 'cache':
> max objects: N/A
> max bytes : N/A
> But I set target_max_bytes:
> # ceph osd pool set cache target_max_bytes 1
> Can it serve as the reason?
I've been unable to reproduce http://tracker.ceph.com/issues/13098
w
On Fri, Feb 26, 2016 at 5:53 AM, Christian Balzer wrote:
>
> Hello,
>
> On Thu, 25 Feb 2016 23:09:52 -0600 Adam Tygart wrote:
>
>> The docs are already split by version, although it doesn't help that
>> it isn't linked in an obvious manner.
>>
>> http://docs.ceph.com/docs/master/rados/operations/c
On Fri, Feb 26, 2016 at 5:24 AM, Nigel Williams
wrote:
> On Fri, Feb 26, 2016 at 4:09 PM, Adam Tygart wrote:
>> The docs are already split by version, although it doesn't help that
>> it isn't linked in an obvious manner.
>>
>> http://docs.ceph.com/docs/master/rados/operations/cache-tiering/
>
>
On 26 February 2016 at 05:53, Christian Balzer wrote:
> I have a feeling some dedicated editors including knowledgeable and vetted
> volunteers would do a better job that just spamming PRs, which tend to be
> forgotten/ignored by the already overworked devs.
When I made a (trivial, to be fair)
> My guess would be that if you are already running hammer on the client it is
> already using the new watcher API. This would be a fix on the OSDs to allow
> the object to be moved because the current client is smart enough to try
> again. It would be watchers per object.
> Sent from a mobile devi
Hi,
Maybe this is the reason of another bug?
http://tracker.ceph.com/issues/13764
The situation is very similiar...
--
Regards
Dominik
2016-02-25 16:17 GMT+01:00 Ritter Sławomir :
> Hi,
>
>
>
> We have two CEPH clusters running on Dumpling 0.67.11 and some of our
> "multipart objects" are incompl
This Infernalis point release fixes several packagins and init script
issues, enables the librbd objectmap feature by default, a few librbd
bugs, and a range of miscellaneous bug fixes across the system.
We recommend that all infernalis v9.2.0 users upgrade.
For more detailed information, see t
Hello Nick,
On Fri, 26 Feb 2016 09:46:03 - Nick Fisk wrote:
> Hi Christian,
>
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> > Of Christian Balzer
> > Sent: 26 February 2016 09:07
> > To: ceph-users@lists.ceph.com
> > Subject: [cep
On 02/24/2016 07:10 PM, Christian Balzer wrote:
10 second rados bench with 4KB blocks, 219MB written in total.
nand-writes per SSD:41*32MB=1312MB.
10496MB total written to all SSDs.
Amplification:48!!!
Le ouch.
In my use case with rbd cache on all VMs I expect writes to be rather
large for the m
RBD backend might be even worse, depending on how large dataset you try. One
4KB block can end up creating a 4MB object, and depending on how well
hole-punching and fallocate works on your system you could in theory end up
with a >1000 amplification if you always hit a different 4MB chunk (but t
On Fri, Feb 26, 2016 at 6:08 AM, Andy Allan wrote:
> including a nice big obvious version switcher banner on every
> page.
We used to have something like this, but we didn't set it back up when
we migrated the web servers to new infrastructure a while back. It was
using https://github.com/alfredo
> In this case it's likely rados bench using tiny objects that's
> causing the massive overhead. rados bench is doing each write to a new
> object, which ends up in a new file beneath the osd, with its own
> xattrs too. For 4k writes, that's a ton of overhead.
That means that we don't see any prop
On Fri, Feb 26, 2016 at 11:28 PM, John Spray wrote:
> Some projects have big angry warning banners at the top of their
> master branch documentation, I think perhaps we should do that too,
> and at the same time try to find a way to steer google hits to the
> latest stable branch docs rather than
On Sat, Feb 27, 2016 at 12:08 AM, Andy Allan wrote:
> When I made a (trivial, to be fair) documentation PR it was dealt with
> immediately, both when I opened it, and when I fixed up my commit
> message. I'd recommend that if anyone sees anything wrong with the
> docs, just submit a PR with the fi
On 02/26/2016 01:42 PM, Jan Schermer wrote:
RBD backend might be even worse, depending on how large dataset you try. One 4KB
block can end up creating a 4MB object, and depending on how well hole-punching
and fallocate works on your system you could in theory end up with a >1000
amplification
Hi Cephers
At the moment we are trying to recover our CEPH cluser (0.87) which is
behaving very odd.
What have been done :
1. OSD drive failure happened - CEPH put OSD down and out.
2. Physical HDD replaced and NOT added to CEPH - here we had strange
kernel crash just after HDD connected to th
Hello,
> We started having high wait times on the M600s so we got 6 S3610s, 6 M500dcs,
> and 6 500 GB M600s (they have the SLC to MLC conversion that we thought might
> work better).
Is it working better as you were expecting?
> We have graphite gathering stats on the admin sockets for Ceph a
Hi Cephers
At the moment we are trying to recover our CEPH cluser (0.87) which is
behaving very odd.
What have been done :
1. OSD drive failure happened - CEPH put OSD down and out.
2. Physical HDD replaced and NOT added to CEPH - here we had strange
kernel crash just after HDD connected to th
Thanks!
In jewel, as you mentioned, there will be "--max-objects" and "--object-size"
options.
That hint will go away or mitigate /w those options. Collect?
Are those options available in:
# ceph -v
ceph version 10.0.2 (86764eaebe1eda943c59d7d784b893ec8b0c6ff9)??
Rgds,
Shinobu
- Original
On 02/26/2016 02:27 PM, Shinobu Kinjo wrote:
In this case it's likely rados bench using tiny objects that's
causing the massive overhead. rados bench is doing each write to a new
object, which ends up in a new file beneath the osd, with its own
xattrs too. For 4k writes, that's a ton of overhead.
On 02/26/2016 03:17 PM, Shinobu Kinjo wrote:
In jewel, as you mentioned, there will be "--max-objects" and "--object-size"
options.
That hint will go away or mitigate /w those options. Collect?
The io hint isn't sent by rados bench, just rbd. So even with those
options, rados bench still doesn
Thanks for your input.
I'm getting clear. It may be necessary to ask you more though -;
Rgds,
Shinobu
- Original Message -
From: "Josh Durgin"
To: "Shinobu Kinjo"
Cc: "Jan Schermer" , ceph-users@lists.ceph.com
Sent: Saturday, February 27, 2016 8:39:39 AM
Subject: Re: [ceph-users] Obser
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Resending sans attachment...
A picture is worth a thousand words:
http://robert.leblancnet.us/files/s3610-load-test-20160224.png
The red lines are the m600s IO time (dotted) and IOPs (solid) and our
baseline s3610s in green and our test set of s36
Ignoring the durability and network issues for now :) Are there any
aspects of a journals performance that matter most for over all ceph
performance?
i.e my inital thought is if I want to improve ceph write performance
journal seq write speed is what matters. Does random write speed factor
at
You need to make sure SSD O_DIRECT|O_DSYNC performance is good. Not all the
SSDs are good at it..Refer the prior discussions in the community for that.
<< Presumably as long as the SSD read speed exceeds that of the spinners, that
is sufficient.
You probably meant write speed of SSDs ? Journal w
Thank you for your very precious output.
"s3610s write iops high-load" is very interesting to me.
Have you every did any same test set of s3610s for m600s?
> These clusters normally service 12K IOPs with bursts up to 22K IOPs all RBD.
> I've seen a peak of 64K IOPs from client traffic.
That's pr
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Honestly, we are scared to try the same tests with the m600s. When we
first put them in, we had them more full, but we backed them off to
reduce the load on them. Based on that I don't expect them to fair any
better. We'd love to get more IOPs out of
I've done a bit of testing with the Intel units: S3600, S3700, S3710, and
P3700. I've also tested the Samsung 850 Pro, 845DC Pro, and SM863.
All of my testing was "worst case IOPS" as described here:
http://www.anandtech.com/show/8319/samsung-ssd-845dc-evopro-preview-exploring-worstcase-iops/6
Thank you for your response!
All my hosts have raid cards. Some raid cards are in pass-throughput mode,
and the others are in write-back mode. I will set all raid cards
pass-throughput mode and observe for a period of time.
Best Regards
sunspot
2016-02-25 20:07 GMT+08:00 Ferhat Ozkasgarli :
>
> Honestly, we are scared to try the same tests with the m600s. When we
> first put them in, we had them more full, but we backed them off to
> reduce the load on them.
I see.
Did you tune anything on linux layer like:
vm.vfs_cache_pressure
It may not be necessary to mention specifically since
41 matches
Mail list logo