Re: [ceph-users] read performance VS network usage

2015-04-24 Thread SCHAER Frederic
Hi Nick, Thanks for your explanation. I have some doubts this is what's happening, but I'm going to first check what happens with disks IO with a clean pool and clean bench data (discarding any existing cache...) I'm using the following commands for creating the bench data (and benching writes

Re: [ceph-users] Erasure Coding : gf-Complete

2015-04-24 Thread Loic Dachary
Hi, On 24/04/2015 00:14, Garg, Pankaj wrote: > Thanks Loic. I was just looking at the source trees for gf-complete and saw > that v2-ceph tag has the optimizations and that's associated with Hammer. > > One more question, on Hammer, will the Optimizations kick in automatically > for ARM. Yes.

Re: [ceph-users] read performance VS network usage

2015-04-24 Thread SCHAER Frederic
OK, I must learn how to read dstat... I took the recv column for the send column... total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 15 22 43 16 0 4| 343M 7916k| 252M 659M| 0 0 | 78k 122k 1

Re: [ceph-users] read performance VS network usage

2015-04-24 Thread SCHAER Frederic
And to reply to myslef... The client apparent network bandwidth is just the fact that dstat aggregates the bridge network interface and the physical interface, thus doubling the data... Ah ah ah. Regards De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de SCHAER Frederic

Re: [ceph-users] SAS-Exp 9300-8i or Raid-Contr 9750-4i ?

2015-04-24 Thread Markus Goldberg
Hi Jacob, are you sure, that megacli is correct for the 9750-4i (not 9750-4i4e) ? The 9750-4i is not in the list of supported devices. btw, what makes the difference between jbod and raid-single-disk? Thanks, Markus Am 23.04.2015 um 13:32 schrieb Weeks, Jacob (RIS-BCT): The 9750-4i may suppor

Re: [ceph-users] long blocking with writes on rbds

2015-04-24 Thread Jeff Epstein
Hi JC, In answer to your question, iostat shows high wait times on the RBD, but not on the underlying medium. For example: Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util rbd50 0.00 0.010.000.00

Re: [ceph-users] ceph-disk activate hangs with external journal device

2015-04-24 Thread Daniel Piddock
On 23/04/15 15:18, Robert LeBlanc wrote: > > Sorry, reading too fast. That key isn't from a previous attempt, > correct? But I doubt that is the problem as you would receive an > access denied message in the logs. > > Try running Ceph-disk zap and recreate the OSD. Also remove the Auth > key and th

Re: [ceph-users] Having trouble getting good performance

2015-04-24 Thread Nick Fisk
Hi David, Thanks for posting those results. >From the Fio runs, I see you are getting around 200 iops at 128kb write io size. I would imagine you should be getting somewhere around 200-300 iops for the cluster you posted in the initial post, so it looks like its performing about right. 200 iops

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-24 Thread Irek Fasikhov
Hi,Alexandre! Do not try to change the parameter vm.min_free_kbytes? 2015-04-23 19:24 GMT+03:00 Somnath Roy : > Alexandre, > You can configure with --with-jemalloc or ./do_autogen -J to build ceph > with jemalloc. > > Thanks & Regards > Somnath > > -Original Message- > From: ceph-users [m

[ceph-users] rgw geo-replication

2015-04-24 Thread GuangYang
Hi cephers, Recently I am investigating the geo-replication of rgw, from the example at [1], it looks like if we want to do data geo replication between us east and us west, we will need to build *one* (super) RADOS cluster which cross us east and west, and only deploy two different radosgw inst

Re: [ceph-users] rgw geo-replication

2015-04-24 Thread Vikhyat Umrao
On 04/24/2015 05:17 PM, GuangYang wrote: Hi cephers, Recently I am investigating the geo-replication of rgw, from the example at [1], it looks like if we want to do data geo replication between us east and us west, we will need to build *one* (super) RADOS cluster which cross us east and west

Re: [ceph-users] ceph-fuse unable to run through "screen" ?

2015-04-24 Thread Steffen W Sørensen
> On 23/04/2015, at 12.48, Burkhard Linke > wrote: > > Hi, > > I had a similar problem during reboots. It was solved by adding '_netdev' to > the options for the fstab entry. Otherwise the system may try to mount the > cephfs mount point before the network is available. Didn’t knew of the _n

[ceph-users] fstrim does not shrink ceph OSD disk usage ?

2015-04-24 Thread Christoph Adomeit
Hi there, I have a ceph cluster running hammer-release. Recently I trimmed a lot of virtual disks and I can verify that the size of the images has decreased a lot. I checked this with: /usr/bin/rbd diff $IMG | grep -v zero |awk '{ SUM += $2 } END { print SUM/1024/1024 " MB" }' the output afte

[ceph-users] very different performance on two volumes in the same pool

2015-04-24 Thread Nikola Ciprich
Hello, I'm trying to solve a bit mysterious situation: I've got 3 nodes CEPH cluster, and pool made of 3 OSDs (each on one node), OSDs are 1TB SSD drives. pool has 3 replicas set. I'm measuring random IO performance using fio: fio --randrepeat=1 --ioengine=rbd --direct=1 --gtod_reduce=1 --name

Re: [ceph-users] SAS-Exp 9300-8i or Raid-Contr 9750-4i ?

2015-04-24 Thread Weeks, Jacob (RIS-BCT)
I'm not sure if the megacli tools will work with your model, it would have to be tested. They tend to be able to control LSI RAID controllers to varying degrees. The advantage of using JBOD mode vs raid-single-disk is that you may run into issues when swapping drives, the drive letters may chan

Re: [ceph-users] Is CephFS ready for production?

2015-04-24 Thread Marc
On 22/04/2015 16:04, Gregory Farnum wrote: > On Tue, Apr 21, 2015 at 9:53 PM, Mohamed Pakkeer wrote: >> Hi sage, >> >> When can we expect the fully functional fsck for cephfs?. Can we get at next >> major release?. Is there any roadmap or time frame for the fully functional >> fsck release? > We'r

Re: [ceph-users] Having trouble getting good performance

2015-04-24 Thread J David
On Fri, Apr 24, 2015 at 6:39 AM, Nick Fisk wrote: > From the Fio runs, I see you are getting around 200 iops at 128kb write io > size. I would imagine you should be getting somewhere around 200-300 iops > for the cluster you posted in the initial post, so it looks like its > performing about right

Re: [ceph-users] rgw geo-replication

2015-04-24 Thread GuangYang
___ > Date: Fri, 24 Apr 2015 17:29:40 +0530 > From: vum...@redhat.com > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] rgw geo-replication > > > On 04/24/2015 05:17 PM, GuangYang wrote: > > Hi cephers, > Recently I am investigating the geo-replication

Re: [ceph-users] Having trouble getting good performance

2015-04-24 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > J David > Sent: 24 April 2015 15:40 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Having trouble getting good performance > > On Fri, Apr 24, 2015 at 6:39 AM,

[ceph-users] 3.18.11 - RBD triggered deadlock?

2015-04-24 Thread Nikola Ciprich
Hello once again, I seem to have hit one more problem today: 3 nodes test cluster, nodes running 3.18.1 kernel, ceph-0.94.1, 3-replicas pool, backed by SSD osds. After mapping volume using rbd and trying to zero it using dd: dd if=/dev/zero of=/dev/rbd0 bs=1M it was running fine for some time w

[ceph-users] Firefly to Hammer

2015-04-24 Thread Garg, Pankaj
Hi, Can I simply do apt-get upgrade on my FireFly cluster and move to Hammer? I'm assuming Monitor nodes should be done first. Any particular sequence or any other procedures that I need to follow? Any information is appreciated. Thanks Pankaj ___ cep

Re: [ceph-users] 3.18.11 - RBD triggered deadlock?

2015-04-24 Thread Ilya Dryomov
On Fri, Apr 24, 2015 at 6:41 PM, Nikola Ciprich wrote: > Hello once again, > > I seem to have hit one more problem today: > 3 nodes test cluster, nodes running 3.18.1 kernel, > ceph-0.94.1, 3-replicas pool, backed by SSD osds. Does this mean rbd device is mapped on a node that also runs one or mo

Re: [ceph-users] 3.18.11 - RBD triggered deadlock?

2015-04-24 Thread Nikola Ciprich
> > Does this mean rbd device is mapped on a node that also runs one or > more osds? yes.. I know it's not the best practice, but it's just test cluster.. > > Can you watch osd sockets in netstat for a while and describe what you > are seeing or forward a few representative samples? sure, here i

Re: [ceph-users] Shadow Files

2015-04-24 Thread Yehuda Sadeh-Weinraub
What version are you running? There are two different issues that we were fixing this week, and we should have that upstream pretty soon. Yehuda - Original Message - > From: "Ben" > To: "ceph-users" > Cc: "Yehuda Sadeh-Weinraub" > Sent: Thursday, April 23, 2015 7:42:06 PM > Subject: [

Re: [ceph-users] very different performance on two volumes in the same pool

2015-04-24 Thread Somnath Roy
This could be again because of tcmalloc issue I reported earlier. Two things to observe. 1. Is the performance improving if you stop IO on other volume ? If so, it could be different issue. 2. Run perf top in the OSD node and see if tcmalloc traces are popping up. Thanks & Regards Somnath ---

Re: [ceph-users] 3.18.11 - RBD triggered deadlock?

2015-04-24 Thread Ilya Dryomov
On Fri, Apr 24, 2015 at 7:06 PM, Nikola Ciprich wrote: >> >> Does this mean rbd device is mapped on a node that also runs one or >> more osds? > yes.. I know it's not the best practice, but it's just test cluster.. >> >> Can you watch osd sockets in netstat for a while and describe what you >> are

Re: [ceph-users] Is CephFS ready for production?

2015-04-24 Thread Gregory Farnum
I think the VMWare plugin was going to be contracted out by the business people, and it was never going to be upstream anyway -- I've not heard anything since then but you'd need to ask them I think. -Greg On Fri, Apr 24, 2015 at 7:17 AM Marc wrote: > On 22/04/2015 16:04, Gregory Farnum wrote: >

Re: [ceph-users] Radosgw and mds hardware configuration

2015-04-24 Thread Gregory Farnum
The MDS will run in 1GB, but the more RAM it has the more of the metadata you can cache in memory. The faster single-threaded performance your CPU has, the more metadata IOPS you'll get. We haven't done much work characterizing it, though. -Greg On Wed, Apr 22, 2015 at 5:39 PM Francois Lafont wrot

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-24 Thread Alexandre DERUMIER
Hi, I have finished to rebuild ceph with jemalloc, all seem to working fine. I got a constant 300k iops for the moment, so no speed regression. I'll do more long benchmark next week. Regards, Alexandre - Mail original - De: "Irek Fasikhov" À: "Somnath Roy" Cc: "aderumier" , "Mark N

Re: [ceph-users] decrease pg number

2015-04-24 Thread Gregory Farnum
You can't migrate RBD objects via cppool right now as it doesn't handle snapshots at all. I think a few people have done it successfully by setting up existing pools as cache tiers on top of the target pool and then flushing them out, but I've not run through that. You can also just set the PG war

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-24 Thread Stefan Priebe - Profihost AG
Is jemalloc recommanded in general? Does it also work for firefly? Stefan Excuse my typo sent from my mobile phone. > Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER : > > Hi, > > I have finished to rebuild ceph with jemalloc, > > all seem to working fine. > > I got a constant 300k iops fo

Re: [ceph-users] Having trouble getting good performance

2015-04-24 Thread J David
On Fri, Apr 24, 2015 at 10:58 AM, Nick Fisk wrote: > 7.2k drives tend to do about 80 iops at 4kb IO sizes, as the IO size > increases the number of iops will start to fall. You will probably get > around 70 iops for 128kb. But please benchmark your raw disks to get some > accurate numbers if neede

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-24 Thread Milosz Tanski
On Fri, Apr 24, 2015 at 12:38 PM, Alexandre DERUMIER wrote: > > Hi, > > I have finished to rebuild ceph with jemalloc, > > all seem to working fine. > > I got a constant 300k iops for the moment, so no speed regression. > > I'll do more long benchmark next week. > > Regards, > > Alexandre In my

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-24 Thread Mark Nelson
We haven't done any kind of real testing on jemalloc, so use at your own peril. Having said that, we've also been very interested in hearing community feedback from folks trying it out, so please feel free to give it a shot. :D Mark On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote:

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-24 Thread Somnath Roy
<< Does it also work for firefly? No, it is integrated post Giant. Thanks & Regards Somnath From: Stefan Priebe - Profihost AG [mailto:s.pri...@profihost.ag] Sent: Friday, April 24, 2015 10:37 AM To: Alexandre DERUMIER Cc: ceph-users; ceph-devel; Somnath Roy; Mark Nelson; Milosz Tanski Subject: R

Re: [ceph-users] Having trouble getting good performance

2015-04-24 Thread Robert LeBlanc
The client ACKs the write as soon as it is in the journal. I suspect that the primary OSD dispatches the write to all the secondary OSDs at the same time so that it happens in parallel, but I am not an authority on that. The journal writes data serially even if it comes in randomly. There is some

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-24 Thread Anthony Levesque
Hi Christian, We tested some DC S3500 300GB using dd if=randfile of=/dev/sda bs=4k count=10 oflag=direct,dsync we got 96 MB/s which is far from the 315 MB/s from the website. Can I ask you or anyone on the mailing list how you are testing the write speed for journals? Thanks --- Anthony

Re: [ceph-users] Having trouble getting good performance

2015-04-24 Thread Michal Kozanecki
The ZFS recordsize does NOT equal the size of the write to disk, ZFS will write to disk whatever size it feels is optimal. During a sequential write ZFS will easily write in 1MB blocks or greater. In a spinning-rust CEPH set up like yours, getting the most out of it will require higher io dept

Re: [ceph-users] Having trouble getting good performance

2015-04-24 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > J David > Sent: 24 April 2015 18:41 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Having trouble getting good performance > > On Fri, Apr 24, 2015 at 10:58 AM, Ni

Re: [ceph-users] Shadow Files

2015-04-24 Thread Ben Jackson
We were firefly, then we upgraded to giant, now we are on hammer. What issues? On 25 Apr 2015 2:12 am, Yehuda Sadeh-Weinraub wrote: > > What version are you running? There are two different issues that we were > fixing this week, and we should have that upstream pretty soon. > > Yehuda > > --

Re: [ceph-users] Shadow Files

2015-04-24 Thread Yehuda Sadeh-Weinraub
These ones: http://tracker.ceph.com/issues/10295 http://tracker.ceph.com/issues/11447 - Original Message - > From: "Ben Jackson" > To: "Yehuda Sadeh-Weinraub" > Cc: "ceph-users" > Sent: Friday, April 24, 2015 3:06:02 PM > Subject: Re: [ceph-users] Shadow Files > > We were firefly, the

Re: [ceph-users] Shadow Files

2015-04-24 Thread Ben Jackson
They definitely sound like the issues we are experiencing. When do you think an update will be available? On 25 Apr 2015 8:10 am, Yehuda Sadeh-Weinraub wrote: > > These ones: > > http://tracker.ceph.com/issues/10295 > http://tracker.ceph.com/issues/11447 > > - Original Message - > >

Re: [ceph-users] Shadow Files

2015-04-24 Thread Ben Hines
When these are fixed it would be great to get good steps for listing / cleaning up any orphaned objects. I have suspicions this is affecting us. thanks- -Ben On Fri, Apr 24, 2015 at 3:10 PM, Yehuda Sadeh-Weinraub wrote: > These ones: > > http://tracker.ceph.com/issues/10295 > http://tracker.ce

Re: [ceph-users] Shadow Files

2015-04-24 Thread Ben
Definitely need something to help clear out these old shadow files. I'm sure our cluster has around 100TB of these shadow files. I've written a script to go through known objects to get prefixes of objects that should exist to compare to ones that shouldn't, but the time it takes to do this ov

Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-24 Thread Alexandre DERUMIER
>>We haven't done any kind of real testing on jemalloc, so use at your own >>peril. Having said that, we've also been very interested in hearing >>community feedback from folks trying it out, so please feel free to give >>it a shot. :D Some feedback, I have runned bench all the night, no speed

Re: [ceph-users] 3.18.11 - RBD triggered deadlock?

2015-04-24 Thread Nikola Ciprich
> > It seems you just grepped for ceph-osd - that doesn't include sockets > opened by the kernel client, which is what I was after. Paste the > entire netstat? ouch, bummer! here are full netstats, sorry about delay.. http://nik.lbox.cz/download/ceph/ BR nik > > Thanks, > >