Re: [ceph-users] replace OSD disk without removing the osd from crush

2015-07-09 Thread Stefan Priebe - Profihost AG
Am 08.07.2015 um 23:33 schrieb Somnath Roy: > Yes, I am able to reproduce that too..Not sure if this is a bug or change. That's odd. Can someone from inktank comment? > Thanks & Regards > Somnath > > -Original Message- > From: Stefan Priebe [mailto:s.pri...@profihost.ag] > Sent: Wedne

Re: [ceph-users] Cannot map rbd image with striping!

2015-07-09 Thread Ilya Dryomov
On Wed, Jul 8, 2015 at 11:02 PM, Hadi Montakhabi wrote: > Thank you! > Is striping supported while using CephFS? Yes. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph

Re: [ceph-users] Real world benefit from SSD Journals for a more read than write cluster

2015-07-09 Thread Christian Balzer
Hello, On Thu, 09 Jul 2015 08:57:27 +0200 Götz Reinicke - IT Koordinator wrote: > Hi again, > > time is passing, so is my budget :-/ and I have to recheck the options > for a "starter" cluster. An expansion next year for may be an openstack > installation or more performance if the demands rise

[ceph-users] "ERROR: rgw_obj_remove(): cls_cxx_remove returned -2" on OSDs since Hammer upgrade

2015-07-09 Thread Sylvain Munaut
Hi, Since I upgraded to Hammer last weekend, I see errors suchs as 7eff5322d700 0 cls/rgw/cls_rgw.cc:1947: ERROR: rgw_obj_remove(): cls_cxx_remove returned -2 in the logs. What's going on ? Can this be related to the unexplained write activity I see on my OSDs ? Cheers, Sylvain

Re: [ceph-users] Real world benefit from SSD Journals for a more read than write cluster

2015-07-09 Thread Götz Reinicke - IT Koordinator
Hi Christian, Am 09.07.15 um 09:36 schrieb Christian Balzer: > > Hello, > > On Thu, 09 Jul 2015 08:57:27 +0200 Götz Reinicke - IT Koordinator wrote: > >> Hi again, >> >> time is passing, so is my budget :-/ and I have to recheck the options >> for a "starter" cluster. An expansion next year for

[ceph-users] fuse mount in fstab

2015-07-09 Thread Kenneth Waegeman
Hi all, we are trying to mount ceph-fuse in fstab, following this: http://ceph.com/docs/master/cephfs/fstab/ When we add this: id=cephfs,conf=/etc/ceph/ceph.conf /mnt/ceph fuse.ceph defaults0 0 to fstab, we get an error message running mount: mount: can't find id=cephfs

Re: [ceph-users] fuse mount in fstab

2015-07-09 Thread Thomas Lemarchand
Hello Kenneth, I have a working ceph fuse in fstab. Only difference I see it that I don't use "conf", your configuration file is at the default path anyway. id=recette-files-rw,client_mountpoint=/recette-files/files /mnt/wimi/ceph-files fuse.ceph noatime,_netdev 0 0 -- Thomas Lemarchand Cl

Re: [ceph-users] fuse mount in fstab

2015-07-09 Thread Kenneth Waegeman
Hmm, it looks like a version issue.. I am testing with these versions on centos7: ~]# mount -V mount from util-linux 2.23.2 (libmount 2.23.0: selinux, debug, assert) ~]# ceph-fuse -v ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) This do not work.. On my fedora box, with thes

[ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Jan Schermer
I hope this would be interesting for some, it nearly cost me my sanity. Some time ago I came here with a problem manifesting as a “100 IOPS*” limit with the LSI controllers and some drives. It almost drove me crazy as I could replicate the problem with ease but when I wanted to show it to someon

[ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-09 Thread Mallikarjun Biradar
Hi all, Setup details: Two storage enclosures each connected to 4 OSD nodes (Shared storage). Failure domain is Chassis (enclosure) level. Replication count is 2. Each host has allotted with 4 drives. I have active client IO running on cluster. (Random write profile with 4M block size & 64 Queue

Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-09 Thread Jan Schermer
What is the min_size setting for the pool? If you have size=2 and min_size=2, then all your data is safe when one replica is down, but the IO is paused. If you want to continue IO you need to set min_size=1. But be aware that a single failure after that causes you to lose all the data, you’d hav

Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-09 Thread Mallikarjun Biradar
I have size=2 & min_size=1 and IO is paused till all hosts com back. On Thu, Jul 9, 2015 at 4:41 PM, Jan Schermer wrote: > What is the min_size setting for the pool? If you have size=2 and min_size=2, > then all your data is safe when one replica is down, but the IO is paused. If > you want to

Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-09 Thread Jan Schermer
And are the OSDs getting marked down during the outage? Are all the MONs still up? Jan > On 09 Jul 2015, at 13:20, Mallikarjun Biradar > wrote: > > I have size=2 & min_size=1 and IO is paused till all hosts com back. > > On Thu, Jul 9, 2015 at 4:41 PM, Jan Schermer wrote: >> What is the min_

Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-09 Thread Mallikarjun Biradar
Yeah. All OSD's down and monitors still up.. On Thu, Jul 9, 2015 at 4:51 PM, Jan Schermer wrote: > And are the OSDs getting marked down during the outage? > Are all the MONs still up? > > Jan > >> On 09 Jul 2015, at 13:20, Mallikarjun Biradar >> wrote: >> >> I have size=2 & min_size=1 and IO is

Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-09 Thread Gregory Farnum
Your first point of troubleshooting is pretty much always to look at "ceph -s" and see what it says. In this case it's probably telling you that some PGs are down, and then you can look at why (but perhaps it's something else). -Greg On Thu, Jul 9, 2015 at 12:22 PM, Mallikarjun Biradar wrote: > Y

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Alexandre DERUMIER
Hi, I have already see bad performance with Crucial m550 ssd, 400 iops syncronous write. Not sure what model of ssd do you have ? see this: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ what is your result of disk directly with #dd if

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Jan Schermer
The old FUA code has been backported for quite some time. RHEL/CentOS 6.5 and higher have it for sure. I have 12K IOPS in this test on the block device itself. But only 100 filesystem transactions (=IOPS) on filesystem on the same device because the “flush” (=FUA?) operation takes 10ms to finis

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Alexandre DERUMIER
>>I have 12K IOPS in this test on the block device itself. But only 100 >>filesystem transactions (=IOPS) on filesystem on the same device because the >>“flush” (=FUA?) operation takes 10ms to finish. I just can’t replicate the same “flush” operation with fio on the block device, unfortunate

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Jan Schermer
I tried everything: —write-barrier, —sync —fsync, —fdatasync I never get the same 10ms latency. Must be something the filesystem journal/log does that is special. Any ideas where to look? I was hoping blktrace would show what exactly is going on, but it just shows a synchronous write -> (10ms) -

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Alexandre DERUMIER
>>Any ideas where to look? I was hoping blktrace would show what exactly is >>going on, but it just shows a synchronous write -> (10ms) -> completed which size is the write in this case ? 4K ? or more ? - Mail original - De: "Jan Schermer" À: "aderumier" Cc: "ceph-users" Envoyé: Jeud

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Alexandre DERUMIER
I just tried on an intel s3700, on top of xfs fio , with - sequential syncronous 4k write iodepth=1 : 60 iops - sequential syncronous 4k write iodepth=32 : 2000 iops - random syncronous 4k write, iodepth=1 : 8000iops - random syncronous 4k write iodepth=32 : 18000 iops - Mail original ---

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Jan Schermer
This is blktrace of the problem but this is the first time I’ve used it. It begins when “swapper” (probably because it is a dirty page and thus gets flushed?) show up. +8 should mean 8 sectors = 4KiB? 8,160 155925 2.692651182 1436712 Q FWS [fio] 8,160 155926 2.692652285

Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-09 Thread Tony Harris
Sounds to me like you've put yourself at too much risk - *if* I'm reading your message right about your configuration, you have multiple hosts accessing OSDs that are stored on a single shared box - so if that single shared box (single point of failure for multiple nodes) goes down it's possible fo

Re: [ceph-users] Real world benefit from SSD Journals for a more read than write cluster

2015-07-09 Thread David Burley
If you can accept the failure domain, we find 12:1 ratio of SATA spinners to a 400GB P3700 is reasonable. Benchmarks can saturate it, but it is entirely bored in our real-world workload and only 30-50% utilized during backfills. I am sure one could go even further than 12:1 if they wanted, but we h

Re: [ceph-users] Real world benefit from SSD Journals for a more read than write cluster

2015-07-09 Thread Wang, Warren
You'll take a noticeable hit on write latency. Whether or not it's tolerable will be up to you and the workload you have to capture. Large file operations are throughput efficient without an SSD journal, as long as you have enough spindles. About the Intel P3700, you will only need 1 to keep up

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Jan Schermer
Those are very strange numbers. Is the “60” figure right? Can you paste the full fio command and output? Thanks Jan > On 09 Jul 2015, at 15:58, Alexandre DERUMIER wrote: > > I just tried on an intel s3700, on top of xfs > > fio , with > - sequential syncronous 4k write iodepth=1 : 60 iops >

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Somnath Roy
I am not sure how increasing iodepth for sync write is giving you better result..sync fio engine supposed to be always using iodepth =1. BTW, I faced similar issues sometimes back,..By running the following fio job file, I was getting very dismal performance on my SSD on top of XFS.. [random-wri

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Alexandre DERUMIER
Hi again, I totally forgot to check the io scheduler from my last tests, this was with cfq. with noop scheduler, I have a huge difference cfq: - sequential syncronous 4k write iodepth=1 : 60 iops - sequential syncronous 4k write iodepth=32 : 2000 iops noop: - sequential syncronous 4k writ

[ceph-users] Ceph Read Performance Issues

2015-07-09 Thread Garg, Pankaj
Hi, I'm experiencing READ performance issues in my Cluster. I have 3 x86 servers each with 2 SSDs and 9 OSDs. SSDs are being used for Journaling. I seem to get erratic READ performance numbers when using Rados Bench read test. I ran a test with just a single x86 server, with 2 SSDs, and 9 OSDS. P

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Jan Schermer
That’s very strange. Is nothing else using the disks? The difference between noop and cfq should be (and in my experience is) marginal for such a benchmark. Jan > On 09 Jul 2015, at 18:11, Alexandre DERUMIER wrote: > > Hi again, > > I totally forgot to check the io scheduler from my last tes

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Alexandre DERUMIER
>>That’s very strange. Is nothing else using the disks? no. only the fio benchmark. >>The difference between noop and cfq should be (and in my experience is) >>marginal for such a benchmark. maybe a bug in cfq (kernel 3.16 debian jessie) ? also, deadline scheduler give me same perf than noop.

Re: [ceph-users] replace OSD disk without removing the osd from crush

2015-07-09 Thread Wido den Hollander
On 07/09/2015 09:15 AM, Stefan Priebe - Profihost AG wrote: > > Am 08.07.2015 um 23:33 schrieb Somnath Roy: >> Yes, I am able to reproduce that too..Not sure if this is a bug or change. > > That's odd. Can someone from inktank comment? > > Not from Inktank, but here we go. When you add a OSD

Re: [ceph-users] External XFS Filesystem Journal on OSD

2015-07-09 Thread David Burley
Converted a few of our OSD's (spinners) over to a config where the OSD journal and XFS journal both live on an NVMe drive (Intel P3700). The XFS journal might have provided some very minimal performance gains (3%, maybe). Given the low gains, we're going to reject this as something to dig into deep

Re: [ceph-users] replace OSD disk without removing the osd from crush

2015-07-09 Thread Stefan Priebe
Am 09.07.2015 um 19:35 schrieb Wido den Hollander: On 07/09/2015 09:15 AM, Stefan Priebe - Profihost AG wrote: Am 08.07.2015 um 23:33 schrieb Somnath Roy: Yes, I am able to reproduce that too..Not sure if this is a bug or change. That's odd. Can someone from inktank comment? Not from In

[ceph-users] Monitor questions

2015-07-09 Thread Nate Curry
I have a question in regards to monitor nodes and network layout. Its my understanding that there should be two networks; a ceph only network for comms between the various ceph nodes, and a separate storage network where other systems will interface with the ceph nodes. Are the monitor nodes supp

Re: [ceph-users] Monitor questions

2015-07-09 Thread Quentin Hartman
I have my mons sharing the ceph network, and while I currently do not run mds or rgw, I have run those on my mon hosts in the past with no perceptible ill effects. On Thu, Jul 9, 2015 at 3:20 PM, Nate Curry wrote: > I have a question in regards to monitor nodes and network layout. Its my > unde

Re: [ceph-users] External XFS Filesystem Journal on OSD

2015-07-09 Thread Quentin Hartman
Thanks for sharing this info. I've been toying with doing this very thing... How did you measure the performance? I'm specifically looking at reducing the IO load on my spinners and it seems the xfs journaling process is eating a lot of my IO. My queues on my OSD drives frequently get into the 500

Re: [ceph-users] Real world benefit from SSD Journals for a more read than write cluster

2015-07-09 Thread Götz Reinicke
Hi Warren, thanks for that feedback. regarding the 2 or 3 copies we had a lot of internal discussions and lots of pros and cons on 2 and 3 :) … and finally decided to give 2 copies in the first - now called evaluation cluster - a chance to prove. I bet in 2016 we will see, if that was a good de

Re: [ceph-users] Real world benefit from SSD Journals for a more read than write cluster

2015-07-09 Thread Quentin Hartman
So, I was running with size=2, until we had a network interface on an OSD node go faulty, and start corrupting data. Because ceph couldn't tell which copy was right it caused all sorts of trouble. I might have been able to recover more gracefully had I caught the problem sooner and been able to

Re: [ceph-users] External XFS Filesystem Journal on OSD

2015-07-09 Thread David Burley
On Thu, Jul 9, 2015 at 5:42 PM, Quentin Hartman < qhart...@direwolfdigital.com> wrote: > Thanks for sharing this info. I've been toying with doing this very > thing... How did you measure the performance? I'm specifically looking at > reducing the IO load on my spinners and it seems the xfs journa

[ceph-users] How to prefer faster disks in same pool

2015-07-09 Thread Christoph Adomeit
Hi Guys, I have a ceph pool that is mixed with 10k rpm disks and 7.2 k rpm disks. There are 85 osds and 10 of them are 10k Size is not an issue, the pool is filled only 20% I want to somehow prefer the 10 k rpm disks so that they get more i/o What is the most intelligent wy to prefer the faster

Re: [ceph-users] How to prefer faster disks in same pool

2015-07-09 Thread Alexandre DERUMIER
Hi, you need to create 2 crushmaps, 1 for 10k && 1 for 7.2k disks. then create 2 pools, 1pool with crushmap1 and 1 pool with crushmap2. see : http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ - Mail original - De: "Christoph Adomeit" À: "ceph-users"

Re: [ceph-users] How to prefer faster disks in same pool

2015-07-09 Thread Robert LeBlanc
You could also create two roots and two rules and have the primary osd be the 10k drives so that the 7.2k are used primarily for writes. I believe that recipe is on the CRUSH page in the documentation. Robert LeBlanc Sent from a mobile device please excuse any typos. On Jul 9, 2015 10:03 PM, "Ale

[ceph-users] mds0: Client failing to respond to cache pressure

2015-07-09 Thread 谷枫
hi, I use CephFS in production environnement with 7osd,1mds,3mon now. So far so good,but i have a problem with it today. The ceph status report this: cluster ad3421a43-9fd4-4b7a-92ba-09asde3b1a228 health HEALTH_WARN mds0: Client 34271 failing to respond to cache pressure

[ceph-users] Nova with Ceph generate error

2015-07-09 Thread Mario Codeniera
Hi, It is my first time here. I am just having an issue regarding with my configuration with the OpenStack which works perfectly for the cinder and the glance based on Kilo release in CentOS 7. I am based my documentation on this rbd-opeenstack manu

Re: [ceph-users] Investigating my 100 IOPS limit

2015-07-09 Thread Andrew Thrift
We have seen similar poor performance with Intel S3700 and S3710 on LSI SAS3008 with CFQ on 3.13, 3.18 and 3.19 kernels. Switching to noop fixed the problems for us. On Fri, Jul 10, 2015 at 4:30 AM, Alexandre DERUMIER wrote: > >>That’s very strange. Is nothing else using the disks? > no. only