Re: [ceph-users] List of SSDs

2016-02-26 Thread Shinobu Kinjo
> Honestly, we are scared to try the same tests with the m600s. When we > first put them in, we had them more full, but we backed them off to > reduce the load on them. I see. Did you tune anything on linux layer like: vm.vfs_cache_pressure It may not be necessary to mention specifically since

Re: [ceph-users] xfs corruption

2016-02-26 Thread fangchen sun
Thank you for your response! All my hosts have raid cards. Some raid cards are in pass-throughput mode, and the others are in write-back mode. I will set all raid cards pass-throughput mode and observe for a period of time. Best Regards sunspot 2016-02-25 20:07 GMT+08:00 Ferhat Ozkasgarli : >

Re: [ceph-users] List of SSDs

2016-02-26 Thread Heath Albritton
I've done a bit of testing with the Intel units: S3600, S3700, S3710, and P3700. I've also tested the Samsung 850 Pro, 845DC Pro, and SM863. All of my testing was "worst case IOPS" as described here: http://www.anandtech.com/show/8319/samsung-ssd-845dc-evopro-preview-exploring-worstcase-iops/6

Re: [ceph-users] List of SSDs

2016-02-26 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Honestly, we are scared to try the same tests with the m600s. When we first put them in, we had them more full, but we backed them off to reduce the load on them. Based on that I don't expect them to fair any better. We'd love to get more IOPs out of

Re: [ceph-users] List of SSDs

2016-02-26 Thread Shinobu Kinjo
Thank you for your very precious output. "s3610s write iops high-load" is very interesting to me. Have you every did any same test set of s3610s for m600s? > These clusters normally service 12K IOPs with bursts up to 22K IOPs all RBD. > I've seen a peak of 64K IOPs from client traffic. That's pr

Re: [ceph-users] SSD Journal Performance Priorties

2016-02-26 Thread Somnath Roy
You need to make sure SSD O_DIRECT|O_DSYNC performance is good. Not all the SSDs are good at it..Refer the prior discussions in the community for that. << Presumably as long as the SSD read speed exceeds that of the spinners, that is sufficient. You probably meant write speed of SSDs ? Journal w

[ceph-users] SSD Journal Performance Priorties

2016-02-26 Thread Lindsay Mathieson
Ignoring the durability and network issues for now :) Are there any aspects of a journals performance that matter most for over all ceph performance? i.e my inital thought is if I want to improve ceph write performance journal seq write speed is what matters. Does random write speed factor at

Re: [ceph-users] List of SSDs

2016-02-26 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Resending sans attachment... A picture is worth a thousand words: http://robert.leblancnet.us/files/s3610-load-test-20160224.png The red lines are the m600s IO time (dotted) and IOPs (solid) and our baseline s3610s in green and our test set of s36

Re: [ceph-users] Observations with a SSD based pool under Hammer

2016-02-26 Thread Shinobu Kinjo
Thanks for your input. I'm getting clear. It may be necessary to ask you more though -; Rgds, Shinobu - Original Message - From: "Josh Durgin" To: "Shinobu Kinjo" Cc: "Jan Schermer" , ceph-users@lists.ceph.com Sent: Saturday, February 27, 2016 8:39:39 AM Subject: Re: [ceph-users] Obser

Re: [ceph-users] Observations with a SSD based pool under Hammer

2016-02-26 Thread Josh Durgin
On 02/26/2016 03:17 PM, Shinobu Kinjo wrote: In jewel, as you mentioned, there will be "--max-objects" and "--object-size" options. That hint will go away or mitigate /w those options. Collect? The io hint isn't sent by rados bench, just rbd. So even with those options, rados bench still doesn

Re: [ceph-users] Observations with a SSD based pool under Hammer

2016-02-26 Thread Josh Durgin
On 02/26/2016 02:27 PM, Shinobu Kinjo wrote: In this case it's likely rados bench using tiny objects that's causing the massive overhead. rados bench is doing each write to a new object, which ends up in a new file beneath the osd, with its own xattrs too. For 4k writes, that's a ton of overhead.

Re: [ceph-users] Observations with a SSD based pool under Hammer

2016-02-26 Thread Shinobu Kinjo
Thanks! In jewel, as you mentioned, there will be "--max-objects" and "--object-size" options. That hint will go away or mitigate /w those options. Collect? Are those options available in: # ceph -v ceph version 10.0.2 (86764eaebe1eda943c59d7d784b893ec8b0c6ff9)?? Rgds, Shinobu - Original

[ceph-users] Old CEPH (0.87) cluster degradation - putting OSDs down one by one

2016-02-26 Thread maxxik
Hi Cephers At the moment we are trying to recover our CEPH cluser (0.87) which is behaving very odd. What have been done : 1. OSD drive failure happened - CEPH put OSD down and out. 2. Physical HDD replaced and NOT added to CEPH - here we had strange kernel crash just after HDD connected to th

Re: [ceph-users] List of SSDs

2016-02-26 Thread Shinobu Kinjo
Hello, > We started having high wait times on the M600s so we got 6 S3610s, 6 M500dcs, > and 6 500 GB M600s (they have the SLC to MLC conversion that we thought might > work better). Is it working better as you were expecting? > We have graphite gathering stats on the admin sockets for Ceph a

[ceph-users] Old CEPH (0.87) cluster degradation - putting OSDs down one by one

2016-02-26 Thread maxxik
Hi Cephers At the moment we are trying to recover our CEPH cluser (0.87) which is behaving very odd. What have been done : 1. OSD drive failure happened - CEPH put OSD down and out. 2. Physical HDD replaced and NOT added to CEPH - here we had strange kernel crash just after HDD connected to th

Re: [ceph-users] Observations with a SSD based pool under Hammer

2016-02-26 Thread Josh Durgin
On 02/26/2016 01:42 PM, Jan Schermer wrote: RBD backend might be even worse, depending on how large dataset you try. One 4KB block can end up creating a 4MB object, and depending on how well hole-punching and fallocate works on your system you could in theory end up with a >1000 amplification

Re: [ceph-users] State of Ceph documention

2016-02-26 Thread Nigel Williams
On Sat, Feb 27, 2016 at 12:08 AM, Andy Allan wrote: > When I made a (trivial, to be fair) documentation PR it was dealt with > immediately, both when I opened it, and when I fixed up my commit > message. I'd recommend that if anyone sees anything wrong with the > docs, just submit a PR with the fi

Re: [ceph-users] State of Ceph documention

2016-02-26 Thread Nigel Williams
On Fri, Feb 26, 2016 at 11:28 PM, John Spray wrote: > Some projects have big angry warning banners at the top of their > master branch documentation, I think perhaps we should do that too, > and at the same time try to find a way to steer google hits to the > latest stable branch docs rather than

Re: [ceph-users] Observations with a SSD based pool under Hammer

2016-02-26 Thread Shinobu Kinjo
> In this case it's likely rados bench using tiny objects that's > causing the massive overhead. rados bench is doing each write to a new > object, which ends up in a new file beneath the osd, with its own > xattrs too. For 4k writes, that's a ton of overhead. That means that we don't see any prop

Re: [ceph-users] State of Ceph documention

2016-02-26 Thread Ken Dreyer
On Fri, Feb 26, 2016 at 6:08 AM, Andy Allan wrote: > including a nice big obvious version switcher banner on every > page. We used to have something like this, but we didn't set it back up when we migrated the web servers to new infrastructure a while back. It was using https://github.com/alfredo

Re: [ceph-users] Observations with a SSD based pool under Hammer

2016-02-26 Thread Jan Schermer
RBD backend might be even worse, depending on how large dataset you try. One 4KB block can end up creating a 4MB object, and depending on how well hole-punching and fallocate works on your system you could in theory end up with a >1000 amplification if you always hit a different 4MB chunk (but t

Re: [ceph-users] Observations with a SSD based pool under Hammer

2016-02-26 Thread Josh Durgin
On 02/24/2016 07:10 PM, Christian Balzer wrote: 10 second rados bench with 4KB blocks, 219MB written in total. nand-writes per SSD:41*32MB=1312MB. 10496MB total written to all SSDs. Amplification:48!!! Le ouch. In my use case with rbd cache on all VMs I expect writes to be rather large for the m

Re: [ceph-users] Cache tier weirdness

2016-02-26 Thread Christian Balzer
Hello Nick, On Fri, 26 Feb 2016 09:46:03 - Nick Fisk wrote: > Hi Christian, > > > -Original Message- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > > Of Christian Balzer > > Sent: 26 February 2016 09:07 > > To: ceph-users@lists.ceph.com > > Subject: [cep

[ceph-users] v9.2.1 Infernalis released

2016-02-26 Thread Sage Weil
This Infernalis point release fixes several packagins and init script issues, enables the librbd objectmap feature by default, a few librbd bugs, and a range of miscellaneous bug fixes across the system. We recommend that all infernalis v9.2.0 users upgrade. For more detailed information, see t

Re: [ceph-users] Problem: silently corrupted RadosGW objects caused by slow requests

2016-02-26 Thread Dominik Mostowiec
Hi, Maybe this is the reason of another bug? http://tracker.ceph.com/issues/13764 The situation is very similiar... -- Regards Dominik 2016-02-25 16:17 GMT+01:00 Ritter Sławomir : > Hi, > > > > We have two CEPH clusters running on Dumpling 0.67.11 and some of our > "multipart objects" are incompl

Re: [ceph-users] Can not disable rbd cache

2016-02-26 Thread Jason Dillaman
> My guess would be that if you are already running hammer on the client it is > already using the new watcher API. This would be a fix on the OSDs to allow > the object to be moved because the current client is smart enough to try > again. It would be watchers per object. > Sent from a mobile devi

Re: [ceph-users] State of Ceph documention

2016-02-26 Thread Andy Allan
On 26 February 2016 at 05:53, Christian Balzer wrote: > I have a feeling some dedicated editors including knowledgeable and vetted > volunteers would do a better job that just spamming PRs, which tend to be > forgotten/ignored by the already overworked devs. When I made a (trivial, to be fair)

Re: [ceph-users] State of Ceph documention

2016-02-26 Thread John Spray
On Fri, Feb 26, 2016 at 5:24 AM, Nigel Williams wrote: > On Fri, Feb 26, 2016 at 4:09 PM, Adam Tygart wrote: >> The docs are already split by version, although it doesn't help that >> it isn't linked in an obvious manner. >> >> http://docs.ceph.com/docs/master/rados/operations/cache-tiering/ > >

Re: [ceph-users] State of Ceph documention

2016-02-26 Thread John Spray
On Fri, Feb 26, 2016 at 5:53 AM, Christian Balzer wrote: > > Hello, > > On Thu, 25 Feb 2016 23:09:52 -0600 Adam Tygart wrote: > >> The docs are already split by version, although it doesn't help that >> it isn't linked in an obvious manner. >> >> http://docs.ceph.com/docs/master/rados/operations/c

Re: [ceph-users] OSDs are crashing during PG replication

2016-02-26 Thread Alexey Sheplyakov
Alexander, > # ceph osd pool get-quota cache > quotas for pool 'cache': > max objects: N/A > max bytes : N/A > But I set target_max_bytes: > # ceph osd pool set cache target_max_bytes 1 > Can it serve as the reason? I've been unable to reproduce http://tracker.ceph.com/issues/13098 w

Re: [ceph-users] Bug in rados bench with 0.94.6 (regression, not present in 0.94.5)

2016-02-26 Thread Dan van der Ster
I can reproduce and updated the ticket. (I only upgraded the client, not the server). It seems to be related to the new --no-verify option, which is giving strange results -- see the ticket. -- Dan On Fri, Feb 26, 2016 at 11:48 AM, Alexey Sheplyakov wrote: > Christian, > >> Note that "rand" wo

Re: [ceph-users] Bug in rados bench with 0.94.6 (regression, not present in 0.94.5)

2016-02-26 Thread Alexey Sheplyakov
Christian, > Note that "rand" works fine, as does "seq" on a 0.95.5 cluster. Could you please check if 0.94.5 ("old") *client* works with 0.94.6 ("new") servers, and vice a versa? Best regards, Alexey On Fri, Feb 26, 2016 at 9:44 AM, Christian Balzer wrote: > > Hello, > > On my crappy te

Re: [ceph-users] Guest sync write iops so poor.

2016-02-26 Thread Nick Fisk
Thanks Jan, that is an excellent explanation. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: 26 February 2016 10:07 To: Huan Zhang Cc: josh durgin ; Nick Fisk ; ceph-users Subject: Re: [ceph-users] Guest sync write iops so poor. O_DIRECT is _no

Re: [ceph-users] Guest sync write iops so poor.

2016-02-26 Thread Jan Schermer
Also take a look at Galera cluster. You can relax flushing to disk as long as all your nodes don't go down at the same time. (And when a node goes back up after a crash you should trash it before it rejoins the cluster) Jan > On 26 Feb 2016, at 11:01, Nick Fisk wrote: > > I guess my question

Re: [ceph-users] Guest sync write iops so poor.

2016-02-26 Thread Jan Schermer
O_DIRECT is _not_ a flag for synchronous blocking IO. O_DIRECT only hints the kernel that it needs not cache/buffer the data. The kernel is actually free to buffer and cache it and it does buffer it. It also does _not_ flush O_DIRECT writes to disk but it makes best effort to send it to the drives

Re: [ceph-users] Guest sync write iops so poor.

2016-02-26 Thread Nick Fisk
I guess my question was more around what does your final workload look like, if it’s the same as the SQL benchmarks then you are not going to get much better performance than what you do now, aside from trying some of the tuning options I mentioned which might get you an extra 100iops. The only

Re: [ceph-users] Guest sync write iops so poor.

2016-02-26 Thread Huan Zhang
fio /dev/rbd0 sync=1 has no problem. Doesn't find 'sync cache code' in linux rbd block driver and radosgw api. Seems sync cache is just the concept of librbd (for rbd cache). Just my concerns. 2016-02-26 17:30 GMT+08:00 Huan Zhang : > Hi Nick, > DB's IO pattern depends on config, mysql for exampl

Re: [ceph-users] Cache tier weirdness

2016-02-26 Thread Nick Fisk
Hi Christian, > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Christian Balzer > Sent: 26 February 2016 09:07 > To: ceph-users@lists.ceph.com > Subject: [ceph-users] Cache tier weirdness > > > Hello, > > still my test cluster with 0.94.6

Re: [ceph-users] Guest sync write iops so poor.

2016-02-26 Thread Huan Zhang
Hi Nick, DB's IO pattern depends on config, mysql for example. innodb_flush_log_at_trx_commit =1, mysql will sync after one transcation. like: write sync wirte sync ... innodb_flush_log_at_trx_commit = 5, write write write write write sync innodb_flush_log_at_trx_commit = 0, write write ... one s

[ceph-users] Cache tier weirdness

2016-02-26 Thread Christian Balzer
Hello, still my test cluster with 0.94.6. It's a bit fuzzy, but I don't think I saw this with Firefly, but then again that is totally broken when it comes to cache tiers (switching between writeback and forward mode). goat is a cache pool for rbd: --- # ceph osd pool ls detail pool 2 'rbd' repl

Re: [ceph-users] Guest sync write iops so poor.

2016-02-26 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Huan Zhang > Sent: 26 February 2016 06:50 > To: Jason Dillaman > Cc: josh durgin ; Nick Fisk ; > ceph-users > Subject: Re: [ceph-users] Guest sync write iops so poor. > > rbd engine with fsy