Re: [ceph-users] CACHEMODE_READFORWARD doesn't try proxy write?

2015-11-24 Thread Nick Fisk
Hi Greg, I’m not using it, I am just going through the code to try and come up with some improvements to the promotion logic and thought I would just double check if this also requires a bit of work. Nick From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Gregor

Re: [ceph-users] Cluster always scrubbing.

2015-11-24 Thread Sean Redmond
Hi, That seems very odd - what do the logs say for the osds with slow requests? Thanks On Tue, Nov 24, 2015 at 2:20 AM, Mika c wrote: > Hi Sean, >Yes, the cluster scrubbing status(scrub + deep scrub) is almost two > weeks. >And the result of execute `ceph pg dump | grep scrub` is empty

Re: [ceph-users] Ceph 0.94.5 with accelio

2015-11-24 Thread German Anders
After doing some more in deep research and tune some parameters I've gain a little bit more of performance: # fio --rw=randread --bs=1m --numjobs=4 --iodepth=32 --runtime=22 --time_based --size=16777216k --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --nora

[ceph-users] Vierified and tested SAS/SATA SSD for Ceph

2015-11-24 Thread Mike Almateia
Hello. Someone have list of verified/tested SSD drives for Ceph? I thinking about Ultrastar SSD1600MM SAS SSD for our all-flash Ceph cluster. Somebody use it in production? -- Mike, runs. ___ ceph-users mailing list ceph-users@lists.ceph.com http://l

Re: [ceph-users] Ceph 0.94.5 with accelio

2015-11-24 Thread Mark Nelson
Each port should be able to do 40Gb/s or 56Gb/s minus overhead and any PCIe or car related bottlenecks. IPoIB will further limit that, especially if you haven't done any kind of interrupt affinity tuning. Assuming these are mellanox cards you'll want to read this guide: http://www.mellanox.co

Re: [ceph-users] Performance question

2015-11-24 Thread Alan Johnson
Are the journals on the same device – it might be better to use the SSDs for journaling since you are not getting better performance with SSDs? From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Marek Dohojda Sent: Monday, November 23, 2015 10:24 PM To: Haomai Wang Cc: ceph

Re: [ceph-users] Ceph 0.94.5 with accelio

2015-11-24 Thread German Anders
Thanks a lot for the response Mark, I will take a look at the guide that you point me out. Regarding the iperf results find them below: *FDR-HOST -> to -> QDR-Blade-HOST* *(client) (server)* server: -- # iperf -s Server l

Re: [ceph-users] Ceph 0.94.5 with accelio

2015-11-24 Thread German Anders
Another test make between two HP blades with QDR (with bonding) e60-host01# iperf -s Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) [ 5] local 172.23.

Re: [ceph-users] v0.80.11 Firefly released

2015-11-24 Thread Nathan Cutler
On 11/20/2015 09:31 AM, Loic Dachary wrote: > Hi, > > On 20/11/2015 02:13, Yonghua Peng wrote: >> I have been using firefly release. is there an official documentation for >> upgrading? thanks. > > Here it is : http://docs.ceph.com/docs/firefly/install/upgrading-ceph/ > > Enjoy ! Also suggest

Re: [ceph-users] Performance question

2015-11-24 Thread Marek Dohojda
Yeah they are, that is one thing I was planning on changing, What I am really interested at the moment, is vague expected performance. I mean is 100MB around normal, very low, or "could be better"? On Tue, Nov 24, 2015 at 8:02 AM, Alan Johnson wrote: > Are the journals on the same device – it m

Re: [ceph-users] Performance question

2015-11-24 Thread Mart van Santen
Dear Marek, I would expect a higher performance, but did you measure this? with rados bench? Please note, ceph is build for parallel access, so the combined speed increases with more threads, if this is a single thread measurement, I wonder how well it reflects the performance of the platform. Wit

Re: [ceph-users] Vierified and tested SAS/SATA SSD for Ceph

2015-11-24 Thread Jan Schermer
Intel DC series (S3610 for journals, S3510 might be OK for data). Samsung DC PRO series (if you can get them). There are other drives that might be suitable but I strongly suggest you avoid those that aren't tested by others - it's a PITA to deal with the problems poor SSDs cause. Jan > On 24

[ceph-users] Upgrade to hammer, crush tuneables issue

2015-11-24 Thread Joe Ryner
Hi, Last night I upgraded my cluster from Centos 6.5 -> Centos 7.1 and in the process upgraded from Emperor -> Firefly -> Hammer When I finished I changed the crush tunables from ceph osd crush tunables legacy -> ceph osd crush tunables optimal I knew this would cause data movement. But the IO

Re: [ceph-users] Performance question

2015-11-24 Thread Marek Dohojda
Hello I used real world type data, which is to say rsync, dd , scp, etc.. In addition to simple performance of the VMs themselves (I am using this Ceph cluster as backend to KVM) . Regardless of which method I used I averaged between 90 to 100MB. On Tue, Nov 24, 2015 at 8:47 AM, Mart van Santen

Re: [ceph-users] Performance question

2015-11-24 Thread Alan Johnson
Hard to know without more config details such as no of servers, network – GigE or !0 GigE, also not sure how you are measuring, (reads or writes) you could try RADOS bench as a baseline, I would expect more performance with 7 X 10K spinners journaled to SSDs. The fact that SSDs did not perform

Re: [ceph-users] Performance question

2015-11-24 Thread Marek Dohojda
7 total servers, 20 GIG pipe between servers, both reads and writes. The network itself has plenty of pipe left, it is averaging 40Mbits/s Rados Bench SAS 30 writes Total time run: 30.591927 Total writes made: 386 Write size: 4194304 Bandwidth (MB/sec): 50.471 Stdde

Re: [ceph-users] Performance question

2015-11-24 Thread Zoltan Arnold Nagy
You are talking about 20 “GIG” (what is that? GB/s? Gb/s? I assume the latter) then talk about 40Mbit/s. Am I the only one who cannot parse this? :-) > On 24 Nov 2015, at 17:27, Marek Dohojda wrote: > > 7 total servers, 20 GIG pipe between servers, both reads and writes. The > network itself

Re: [ceph-users] Performance question

2015-11-24 Thread Nick Fisk
You haven’t stated what size replication you are running. Keep in mind that with a replication factor of 3, you will be writing 6x the amount of data down to disks than what the benchmark says (3x replication x2 for data+journal write). You might actually be near the hardware maximums. What

Re: [ceph-users] Ceph 0.94.5 with accelio

2015-11-24 Thread German Anders
Is anyone around the list using ceph + IB FDR or QDR and getting with fio or any other tool around 3GB/s? then if possible to share some config variables to see where can I tweak a little bit, since I've already use mlnx_tune and mlnx_affinity in order to improve and change parameters for irq affin

Re: [ceph-users] Performance question

2015-11-24 Thread Mark Nelson
this. With 3x replication, journals on disk, and large writes, you'll essentially be writing the data out 6 times. 100MB/s for 7 disks might be a little slow, but it's generally in the right ballpark. One of the goals for newstore is to improve performance for large sequential writes by avoi

Re: [ceph-users] Ceph 0.94.5 with accelio

2015-11-24 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I've had wildly different iperf results based on the version of the kernel, OFED and whether you are using datagram or connected mode as well as the MTU. You really have to just try all the different options to figure out what works the best. Please

Re: [ceph-users] Ceph 0.94.5 with accelio

2015-11-24 Thread Mark Nelson
On 11/24/2015 09:05 AM, German Anders wrote: Thanks a lot for the response Mark, I will take a look at the guide that you point me out. Regarding the iperf results find them below: *FDR-HOST -> to -> QDR-Blade-HOST * *(client) (server)* server: -- # iperf -s --

Re: [ceph-users] Ceph 0.94.5 with accelio

2015-11-24 Thread German Anders
Thanks a lot Robert for the explanation. I understand what you are saying and I'm also excited to see more about IB with Ceph to get those performance numbers up, and hopefully (hopefully soon) to see accelio working for production. Regarding the HP IB switch we got 4 ports (uplinks) connected to o

Re: [ceph-users] Ceph 0.94.5 with accelio

2015-11-24 Thread German Anders
Yes, I'm wondering if this is my top performance threshold with this kind of setup, although I'll assume that IB perf would be better.. :( *German* 2015-11-24 14:24 GMT-03:00 Mark Nelson : > On 11/24/2015 09:05 AM, German Anders wrote: > >> Thanks a lot for the response Mark, I will take a look

Re: [ceph-users] Ceph 0.94.5 with accelio

2015-11-24 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I've gotten about 3.2 GB/s with IPoIB on QDR, but it took a couple of weeks of tuning to get that rate. If your switch is at 2048 MTU, it is really hard to get it increased without an outage if I remember correctly. Connected mode is much easier to g

[ceph-users] [crush] Selecting the current rack

2015-11-24 Thread Emmanuel Lacour
Dear ceph users, I try to write a crush ruleset that will, for a pool size of 3, put a copy in another host in the local rack and a copy in another rack. I now how to do the later, but I do not understand how to match the current rack. Here is my try: rule replicate_three_times { ruleset 1

Re: [ceph-users] Performance question

2015-11-24 Thread Marek Dohojda
Crad I think you are 100% correct: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util 0.00 369.00 33.00 1405.00 132.00 135656.00 188.86 5.614.02 21.943.60 0.70 100.00 I was kinda wondering that this maybe the case, w

Re: [ceph-users] Performance question

2015-11-24 Thread Alan Johnson
Or separate the journals as this will bring the workload down on the spinners to 3Xrather than 6X From: Marek Dohojda [mailto:mdoho...@altitudedigital.com] Sent: Tuesday, November 24, 2015 1:24 PM To: Nick Fisk Cc: Alan Johnson; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Performance ques

Re: [ceph-users] Performance question

2015-11-24 Thread Marek Dohojda
Thank you! I will do that. Would you suggest getting another SSD drive or move the journal to the SSD OSD? (Sorry for a stupid question, if that is such). On Tue, Nov 24, 2015 at 11:25 AM, Alan Johnson wrote: > Or separate the journals as this will bring the workload down on the > spinners to

Re: [ceph-users] Performance question

2015-11-24 Thread Nick Fisk
Separate would be best, but as with many things in life we are not all driving around in sports cars!! Moving the journals to the SSD’s that are also OSD’s themselves will be fine. SSD’s tend to be more bandwidth limited than IOPs and the reverse is true for Disks, so you will get maybe 2x

Re: [ceph-users] Performance question

2015-11-24 Thread Marek Dohojda
I dunno, I think I just go into my Lotus and mull this over ;) (I wish) This is a storage for a KVM, and we have quite a few boxes. While right now none are suffering from IO load, I am seeing slowdown personally and know that sooner or later others will notice as well. I think what I will do is

Re: [ceph-users] Performance question

2015-11-24 Thread Nick Fisk
Ok, but it’s probably a bit of a waste. The journals for each disk will probably require 200-300iops from each SSD and maybe 5GB of space. Personally I would keep the SSD pool, maybe use it for high perf VM’s? Typically VM’s will generate more random smaller IO’s so a default rados bench mig

Re: [ceph-users] Performance question

2015-11-24 Thread Marek Dohojda
Oh, well in that you made my life easier, I like that :) I thought Journal needed to be on a physical space though, not within raw rbd pool. Was I mistaken? On Tue, Nov 24, 2015 at 11:51 AM, Nick Fisk wrote: > Ok, but it’s probably a bit of a waste. The journals for each disk will > probably r

Re: [ceph-users] Performance question

2015-11-24 Thread Bill Sanders
I think what Nick is suggesting is that you create Nx5GB partitions on the SSD's (where N is the number of OSD's you want to have fast journals for), and use the rest of the space for OSDs that would form the SSD pool. Bill On Tue, Nov 24, 2015 at 10:56 AM, Marek Dohojda < mdoho...@altitudedigita

Re: [ceph-users] [crush] Selecting the current rack

2015-11-24 Thread Wido den Hollander
On 11/24/2015 07:00 PM, Emmanuel Lacour wrote: > > Dear ceph users, > > > I try to write a crush ruleset that will, for a pool size of 3, put a > copy in another host in the local rack and a copy in another rack. I now > how to do the later, but I do not understand how to match the current > rac

[ceph-users] (no subject)

2015-11-24 Thread James Gallagher
Hi there, I'm currently following the Ceph QSGs and have currently finished the Storage Cluster Quick Start and have the current topology of admin-node - node1 (mon, mds) - node2 (osd0) - node3 (osd1) I am now looking to continue creating a block device and th

[ceph-users] RGW pool contents

2015-11-24 Thread Somnath Roy
Hi Yehuda/RGW experts, I have one cluster with RGW up and running in the customer site. I did some heavy performance testing on that with CosBench and as a result written significant amount of data to showcase performance on that. Over time, customer also wrote significant amount of data using S3

[ceph-users] Storing Metadata

2015-11-24 Thread James Gallagher
Hi there, I'm currently following the Ceph QSGs and have currently finished the Storage Cluster Quick Start and have the current topology of admin-node - node1 (mon, mds) - node2 (osd0) - node3 (osd1) I am now looking to continue creating a block device and th

Re: [ceph-users] Ceph 0.94.5 with accelio

2015-11-24 Thread German Anders
I'll try to put the ports on the HP IB QDR switch to 4K and then configured the interfaces also to mtu 4096 and do the same tests again and see what are the results. However, is there any other parameter that I need to take into account to tune for this? For example this is the port configuration o

Re: [ceph-users] Performance question

2015-11-24 Thread Marek Dohojda
Doh! Sorry I didn't get that. It does make sense. Thank you everybody. I will try to do set this up before end if this week and let everybody know the results. Since that may help others in the future. On Nov 24, 2015 12:20, "Bill Sanders" wrote: > I think what Nick is suggesting is that you

Re: [ceph-users] [crush] Selecting the current rack

2015-11-24 Thread Gregory Farnum
On Tue, Nov 24, 2015 at 1:37 PM, Wido den Hollander wrote: > On 11/24/2015 07:00 PM, Emmanuel Lacour wrote: >> >> Dear ceph users, >> >> >> I try to write a crush ruleset that will, for a pool size of 3, put a >> copy in another host in the local rack and a copy in another rack. I now >> how to do

Re: [ceph-users] Storing Metadata

2015-11-24 Thread Gregory Farnum
On Tue, Nov 24, 2015 at 1:50 PM, James Gallagher wrote: > Hi there, > > I'm currently following the Ceph QSGs and have currently finished the > Storage Cluster Quick Start and have the current topology of > > admin-node - node1 (mon, mds) > - node2 (osd0) > - no

Re: [ceph-users] Upgrade to hammer, crush tuneables issue

2015-11-24 Thread Warren Wang - ISD
You upgraded (and restarted as appropriate) all the clients first, right? Warren Wang On 11/24/15, 10:52 AM, "Joe Ryner" wrote: >Hi, > >Last night I upgraded my cluster from Centos 6.5 -> Centos 7.1 and in the >process upgraded from Emperor -> Firefly -> Hammer > >When I finished I changed

Re: [ceph-users] Cluster always scrubbing.

2015-11-24 Thread Mika c
Hi Sean, Cut some part of slow request log, please find the attachment. Thinking about set scrub interval more longer(once a month maybe). An irrelevant question, do you think deep scrub will bring the performance down? Best wishes, Mika 2015-11-24 18:24 GMT+08:00 Sean Redmond : > Hi, >

[ceph-users] MDS memory usage

2015-11-24 Thread Mike Miller
Hi, in my cluster with 16 OSD daemons and more than 20 million files on cephfs, the memory usage on MDS is around 16 GB. It seems that 'mds cache size' has no real influence on the memory usage of the MDS. Is there a formula that relates 'mds cache size' directly to memory consumption on the

Re: [ceph-users] MDS memory usage

2015-11-24 Thread Gregory Farnum
On Tue, Nov 24, 2015 at 10:26 PM, Mike Miller wrote: > Hi, > > in my cluster with 16 OSD daemons and more than 20 million files on cephfs, > the memory usage on MDS is around 16 GB. It seems that 'mds cache size' has > no real influence on the memory usage of the MDS. > > Is there a formula that r

Re: [ceph-users] MDS memory usage

2015-11-24 Thread Mike Miller
Hi Greg, thanks very much. This is clear to me now. As for 'MDS cluster', I thought that this was not recommended at this stage? I would very much like to have a number >1 of MDS in my cluster as this would probably help very much to balance the load. But I am afraid what everybody says about