[ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
Hello All, I am trying ceph-firefly 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu 14.04 LTS with 3.16-3 kernel.All are connected to 10G ports with maximum MTU.There are no extra disks for journ

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Philippe Schwarz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Le 28/02/2015 12:19, mad Engineer a écrit : > Hello All, > > I am trying ceph-firefly 0.80.8 > (69eaad7f8308f21573c604f121956e64679a52a7) with 9 OSD ,all Samsung > SSD 850 EVO on 3 servers with 24 G RAM,16 cores @2.27 Ghz Ubuntu > 14.04 LTS with 3.16

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Alexandre DERUMIER
Hi, First, test if your ssd can write fast with O_DSYNC check this blog: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ Then, try with ceph Giant (or maybe wait for Hammer), because they are a lot of optimisations for ssd for threads sh

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Martin B Nielsen
Hi, I cannot recognize that picture; we've been using samsumg 840 pro in production for almost 2 years now - and have had 1 fail. We run a 8node mixed ssd/platter cluster with 4x samsung 840 pro (500gb) in each so that is 32x ssd. They've written ~25TB data in avg each. Using the dd you had ins

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
Thanks for the reply Philippe,we were using these disks in our NAS,now it looks like i am in big trouble :-( On Sat, Feb 28, 2015 at 5:02 PM, Philippe Schwarz wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Le 28/02/2015 12:19, mad Engineer a écrit : >> Hello All, >> >> I am trying c

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Stefan Priebe - Profihost AG
> Am 28.02.2015 um 12:43 schrieb Alexandre DERUMIER : > > Hi, > > First, test if your ssd can write fast with O_DSYNC > check this blog: > http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ > > > Then, try with ceph Giant (or maybe wait fo

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Alexandre DERUMIER
>>But this was replication1? I never was able to do more than 30 000 with >>replication 3. Oh, sorry, it's was about read. for write, I think I was around 3iops with 3 nodes (2x4cores 2,1ghz each), cpu bound, with replication x1. with replication x3, around 9000iops. Going to test on 2x10

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Alexandre DERUMIER
As optimisation, try to set ioscheduler to noop, and also enable rbd_cache=true. (It's really helping for for sequential writes) but your results seem quite low, 926kb/s with 4k, it's only 200io/s. check if you don't have any big network latencies, or mtu fragementation problem. Maybe also t

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
thanks for that link Alexandre, as per that link tried these: *850 EVO* *without dsync* dd if=randfile of=/dev/sdb1 bs=4k count=10 oflag=direct 10+0 records in 10+0 records out 40960 bytes (410 MB) copied, 4.42913 s, 92.5 MB/s with *dsync*: dd if=randfile of=/dev/sdb1 bs=4k co

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Andrei Mikhailovsky
Martin, I have been using Samsung 840 Pro for journals about 2 years now and have just replaced all my samsung drives with Intel. We have found a lot of performance issues with 840 Pro (we are using 128mb). In particular, a very strange behaviour with using 4 partitions (with 50% underprovisio

[ceph-users] Mail not reaching the list?

2015-02-28 Thread Tony Harris
Hi,I've sent a couple of emails to the list since subscribing, but I've never seen them reach the list; I was just wondering if there was something wrong?___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Martin B Nielsen
Hi Andrei, If there is one thing I've come to understand by now is that ceph configs, performance, hw and well - everything - seems to vary on almost people basis. I do not recognize that latency issue either, this is from one of our nodes (4x 500GB samsung 840 pro - sd[c-f]) which has been runni

[ceph-users] RGW hammer/master woes

2015-02-28 Thread Pavan Rallabhandi
Am struggling to get through a basic PUT via swift client with RGW and CEPH binaries built out of Hammer/Master codebase, whereas the same (command on the same setup) is going through with RGW and CEPH binaries built out of Giant. Find below RGW log snippet and the command that was run. Am I mis

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Kevin Walker
What about the Samsung 845DC Pro SSD's? These have fantastic enterprise performance characteristics. http://www.thessdreview.com/our-reviews/samsung-845dc-pro-review-800gb-class-leading-speed-endurance/ Kind regards Kevin On 28 February 2015 at 15:32, Philippe Schwarz wrote: > -BEGIN PGP

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Stefan Priebe
Am 28.02.2015 um 19:41 schrieb Kevin Walker: What about the Samsung 845DC Pro SSD's? These have fantastic enterprise performance characteristics. http://www.thessdreview.com/our-reviews/samsung-845dc-pro-review-800gb-class-leading-speed-endurance/ Or use SV843 from Samsung Semiconductor (sep

[ceph-users] Booting from journal devices

2015-02-28 Thread Nick Fisk
Hi All, Thought I would just share this in case someone finds it useful. I've just finished building our new Ceph cluster where the journals are installed on the same SSD's as the OS. The SSD's have a MD raid partitions for the OS and swap and the rest of the SSD's are used for individual j

[ceph-users] Shutting down a cluster fully and powering it back up

2015-02-28 Thread David
Hi! I’m about to do maintenance on a Ceph Cluster, where we need to shut it all down fully. We’re currently only using it for rados block devices to KVM Hypervizors. Are these steps sane? Shutting it down 1. Shut down all IO to the cluster. Means turning off all clients (KVM Hypervizors in ou

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
tried changing scheduler from deadline to noop also upgraded to Gaint and btrfs filesystem,downgraded kernel to 3.16 from 3.16-3 not much difference dd if=/dev/zero of=hi bs=4k count=25000 oflag=direct 25000+0 records in 25000+0 records out 10240 bytes (102 MB) copied, 94.691 s, 1.1 MB/s Earl

[ceph-users] New Cluster - Any requests?

2015-02-28 Thread Nick Fisk
Hi All, I've just finished building a new POC cluster comprised of the following:- 4 Hosts in 1 chassis (http://www.supermicro.com/products/system/4U/F617/SYS-F617H6-FTPT_.cfm) each with the following:- 2x Xeon 2620 v2 (2.1Ghz) 32GB Ram 2x Onboard 10GB-T into 10GB switches 10x 3TB WD

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-02-28 Thread Chris Murray
After noticing that the number increases by 101 on each attempt to start osd.11, I figured I was only 7 iterations away from the output being within 101 of 63675. So, I killed the osd process, started it again, lather, rinse, repeat. I then did the same for other OSDs. Some created very small logs,

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
reinstalled ceph packages and now with memstore backend [osd objectstore =memstore] its giving 400Kbps .No idea where the problem is. On Sun, Mar 1, 2015 at 12:30 AM, mad Engineer wrote: > tried changing scheduler from deadline to noop also upgraded to Gaint and > btrfs filesystem,downgraded ker

Re: [ceph-users] Shutting down a cluster fully and powering it back up

2015-02-28 Thread Gregory Farnum
Sounds good! -Greg On Sat, Feb 28, 2015 at 10:55 AM David wrote: > Hi! > > I’m about to do maintenance on a Ceph Cluster, where we need to shut it > all down fully. > We’re currently only using it for rados block devices to KVM Hypervizors. > > Are these steps sane? > > Shutting it down > > 1. Sh

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Somnath Roy
I would say check with rados tool like ceph_smalliobench/rados bench first to see how much performance these tools are reporting. This will help you to isolate any upstream issues. Also, check with ‘iostat –xk 1’ for the resource utilization. Hope you are running with powerful enough cpu complex

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread Somnath Roy
Sorry, I saw you have already tried with ‘rados bench’. So, some points here. 1. If you are considering write workload, I think with total of 2 copies and with 4K workload , you should be able to get ~4K iops (considering it hitting the disk, not with memstore). 2. You are having 9 OSDs and if

[ceph-users] Am I reaching the list now?

2015-02-28 Thread Tony Harris
I was subscribed with a yahoo email address, but it was getting some grief so I decided to try using my gmail address, hopefully this one is working -Tony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users

[ceph-users] SSD selection

2015-02-28 Thread Tony Harris
Hi all, I have a small cluster together and it's running fairly well (3 nodes, 21 osds). I'm looking to improve the write performance a bit though, which I was hoping that using SSDs for journals would do. But, I was wondering what people had as recommendations for SSDs to act as journal drives.

Re: [ceph-users] Booting from journal devices

2015-02-28 Thread Christian Balzer
Hello, On Sat, 28 Feb 2015 18:47:14 - Nick Fisk wrote: > Hi All, > > > > Thought I would just share this in case someone finds it useful. > > > > I've just finished building our new Ceph cluster where the journals are > installed on the same SSD's as the OS. The SSD's have a MD raid

Re: [ceph-users] Mail not reaching the list?

2015-02-28 Thread Sudarshan Pathak
Mail is landed in Spam. Here is message from google: *Why is this message in Spam?* It has a from address in yahoo.com but has failed yahoo.com's required tests for authentication. Learn more Regards, Sudarshan Pathak On Sat, F

Re: [ceph-users] SSD selection

2015-02-28 Thread Christian Balzer
On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote: > Hi all, > > I have a small cluster together and it's running fairly well (3 nodes, 21 > osds). I'm looking to improve the write performance a bit though, which > I was hoping that using SSDs for journals would do. But, I was wondering > wh

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-02-28 Thread mad Engineer
I am re installing ceph with giant release,will soon update results with above configuration changes. my servers are Cisco UCS C 200 M1 with Integrated Intel ICH10R SATA controller.Before installing ceph i changed it to use Software RAID quoting from below link [When using the integrated RAID, yo