Re: [ceph-users] pool distribution quality report script

2015-03-04 Thread Blair Bethwaite
Hi Mark, Cool, that looks handy. Though it'd be even better if it could go a step further and recommend re-weighting values to balance things out (or increased PG counts where needed). Cheers, On 5 March 2015 at 15:11, Mark Nelson wrote: > Hi All, > > Recently some folks showed interest in gath

Re: [ceph-users] Unexpected OSD down during deep-scrub

2015-03-04 Thread Italo Santos
New issue created - http://tracker.ceph.com/issues/11027 Regards. Italo Santos http://italosantos.com.br/ On Tuesday, March 3, 2015 at 9:23 PM, Loic Dachary wrote: > Hi Yann, > > That seems related to http://tracker.ceph.com/issues/10536 which seems to be > resolved. Could you create a ne

[ceph-users] pool distribution quality report script

2015-03-04 Thread Mark Nelson
Hi All, Recently some folks showed interest in gathering pool distribution statistics and I remembered I wrote a script to do that a while back. It was broken due to a change in the ceph pg dump output format that was committed a while back, so I cleaned the script up, added detection of head

Re: [ceph-users] Hammer sharded radosgw bucket indexes question

2015-03-04 Thread Ben Hines
Thanks. Can you provide or point me at the configurable & ceph.conf settings? I can't find this in the documentation. Would love to test out the feature. -Ben On Wed, Mar 4, 2015 at 1:08 PM, Yehuda Sadeh-Weinraub wrote: > > > - Original Message - >> From: "Ben Hines" >> To: "ceph-user

Re: [ceph-users] v0.93: Bucket removal with data purge

2015-03-04 Thread Ben Hines
Ah, nevermind - i had to pass the --bucket= argument. You'd think the command would print an error if missing the critical argument. -Ben On Wed, Mar 4, 2015 at 6:06 PM, Ben Hines wrote: > One of the release notes says: > rgw: fix bucket removal with data purge (Yehuda Sadeh) > > Just tried thi

[ceph-users] v0.93: Bucket removal with data purge

2015-03-04 Thread Ben Hines
One of the release notes says: rgw: fix bucket removal with data purge (Yehuda Sadeh) Just tried this and it didnt seem to work: bash-4.1$ time radosgw-admin bucket rm mike-cache2 --purge-objects real0m7.711s user0m0.109s sys 0m0.072s Yet the bucket was not deleted, nor purged: -b

Re: [ceph-users] qemu-kvm and cloned rbd image

2015-03-04 Thread Josh Durgin
On 03/04/2015 01:36 PM, koukou73gr wrote: On 03/03/2015 05:53 PM, Jason Dillaman wrote: Your procedure appears correct to me. Would you mind re-running your cloned image VM with the following ceph.conf properties: [client] rbd cache off debug rbd = 20 log file = /path/writeable/by/qemu.$pid.lo

Re: [ceph-users] Persistent Write Back Cache

2015-03-04 Thread Christian Balzer
Hello Nick, On Wed, 4 Mar 2015 08:49:22 - Nick Fisk wrote: > Hi Christian, > > Yes that's correct, it's on the client side. I don't see this much > different to a battery backed Raid controller, if you lose power, the > data is in the cache until power resumes when it is flushed. > > If yo

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Don Doerner
I don’t know – I am playing with crush; someday I may fully comprehend it. Not today. I think you have to look at it like this: if your possible failure domain options are OSDs, hosts, racks, …, and you choose racks as your failure domain, and you have exactly as many racks as your pool size (

Re: [ceph-users] Ceph User Teething Problems

2015-03-04 Thread Lionel Bouton
On 03/04/15 22:50, Travis Rhoden wrote: > [...] > Thanks for this feedback. I share a lot of your sentiments, > especially that it is good to understand as much of the system as you > can. Everyone's skill level and use-case is different, and > ceph-deploy is targeted more towards PoC use-cases.

Re: [ceph-users] qemu-kvm and cloned rbd image

2015-03-04 Thread koukou73gr
Hi Josh, Thanks for taking a look at this. I 'm answering your questions inline. On 03/04/2015 10:01 PM, Josh Durgin wrote: [...] And then proceeded to create a qemu-kvm guest with rbd/server as its backing store. The guest booted but as soon as it got to mount the root fs, things got weird:

Re: [ceph-users] Ceph User Teething Problems

2015-03-04 Thread Travis Rhoden
On Wed, Mar 4, 2015 at 4:43 PM, Lionel Bouton wrote: > On 03/04/15 22:18, John Spray wrote: >> On 04/03/2015 20:27, Datatone Lists wrote: >>> [...] [Please don't mention ceph-deploy] >> This kind of comment isn't very helpful unless there is a specific >> issue with ceph-deploy that is preventing

Re: [ceph-users] Ceph User Teething Problems

2015-03-04 Thread Lionel Bouton
On 03/04/15 22:18, John Spray wrote: > On 04/03/2015 20:27, Datatone Lists wrote: >> [...] [Please don't mention ceph-deploy] > This kind of comment isn't very helpful unless there is a specific > issue with ceph-deploy that is preventing you from using it, and > causing you to resort to manual ste

Re: [ceph-users] qemu-kvm and cloned rbd image

2015-03-04 Thread koukou73gr
On 03/03/2015 05:53 PM, Jason Dillaman wrote: Your procedure appears correct to me. Would you mind re-running your cloned image VM with the following ceph.conf properties: [client] rbd cache off debug rbd = 20 log file = /path/writeable/by/qemu.$pid.log If you recreate the issue, would you mi

Re: [ceph-users] Ceph User Teething Problems

2015-03-04 Thread John Spray
On 04/03/2015 20:27, Datatone Lists wrote: I have been following ceph for a long time. I have yet to put it into service, and I keep coming back as btrfs improves and ceph reaches higher version numbers. I am now trying ceph 0.93 and kernel 4.0-rc1. Q1) Is it still considered that btrfs is not

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Kyle Hutson
So it sounds like I should figure out at 'how many nodes' do I need to increase pg_num to 4096, and again for 8192, and increase those incrementally when as I add more hosts, correct? On Wed, Mar 4, 2015 at 3:04 PM, Don Doerner wrote: > Sorry, I missed your other questions, down at the bottom.

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Kyle Hutson
That did it. 'step set_choose_tries 200' fixed the problem right away. Thanks Yann! On Wed, Mar 4, 2015 at 2:59 PM, Yann Dupont wrote: > > Le 04/03/2015 21:48, Don Doerner a écrit : > > Hmmm, I just struggled through this myself. How many racks do you have? > If not more than 8, you might w

Re: [ceph-users] Hammer sharded radosgw bucket indexes question

2015-03-04 Thread Yehuda Sadeh-Weinraub
- Original Message - > From: "Ben Hines" > To: "ceph-users" > Sent: Wednesday, March 4, 2015 1:03:16 PM > Subject: [ceph-users] Hammer sharded radosgw bucket indexes question > > Hi, > > These questions were asked previously but perhaps lost: > > We have some large buckets. > > - Wh

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Kyle Hutson
My lowest level (other than OSD) is 'disktype' (based on the crushmaps at http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ ) since I have SSDs and HDDs on the same host. I just made that change (deleted the pool, deleted the profile, deleted the crush ruleset)

[ceph-users] Hammer sharded radosgw bucket indexes question

2015-03-04 Thread Ben Hines
Hi, These questions were asked previously but perhaps lost: We have some large buckets. - When upgrading to Hammer (0.93 or later), is it necessary to recreate the buckets to get a sharded index? - What parameters does the system use for deciding when to shard the index? thanks- -Ben

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Don Doerner
Sorry, I missed your other questions, down at the bottom. See here (look for “number of replicas for replicated pools or the K+M sum for erasure coded pools”) for the formula; 38400/8 probably implies 8192. The thing is, you’ve go

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Yann Dupont
Le 04/03/2015 21:48, Don Doerner a écrit : Hmmm, I just struggled through this myself.How many racks do you have?If not more than 8, you might want to make your failure domain smaller?I.e., maybe host?That, at least, would allow you to debug the situation… -don- Hello, I think I already

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Don Doerner
Hmmm, I just struggled through this myself. How many racks do you have? If not more than 8, you might want to make your failure domain smaller? I.e., maybe host? That, at least, would allow you to debug the situation… -don- From: Kyle Hutson [mailto:kylehut...@ksu.edu] Sent: 04 March, 2015

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Kyle Hutson
It wouldn't let me simply change the pg_num, giving Error EEXIST: specified pg_num 2048 <= current 8192 But that's not a big deal, I just deleted the pool and recreated with 'ceph osd pool create ec44pool 2048 2048 erasure ec44profile' ...and the result is quite similar: 'ceph status' is now ceph

[ceph-users] rados load-gen

2015-03-04 Thread Deneau, Tom
I was looking for a benchmark that uses the librados API and allows a controlled mixing of reads and writes. I noticed rados -p poolname load-gen which seems to support that mixing but I don't see much in the way of documentation for this. Have people used this successfully for mixed read/w

Re: [ceph-users] Ceph User Teething Problems

2015-03-04 Thread Robert LeBlanc
I can't help much on the MDS front, but here is some answers and my view on some of it. On Wed, Mar 4, 2015 at 1:27 PM, Datatone Lists wrote: > I have been following ceph for a long time. I have yet to put it into > service, and I keep coming back as btrfs improves and ceph reaches > higher versi

[ceph-users] Ceph User Teething Problems

2015-03-04 Thread Datatone Lists
I have been following ceph for a long time. I have yet to put it into service, and I keep coming back as btrfs improves and ceph reaches higher version numbers. I am now trying ceph 0.93 and kernel 4.0-rc1. Q1) Is it still considered that btrfs is not robust enough, and that xfs should be used in

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Don Doerner
Oh duh… OK, then given a 4+4 erasure coding scheme, 14400/8 is 1800, so try 2048. -don- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don Doerner Sent: 04 March, 2015 12:14 To: Kyle Hutson; Ceph Users Subject: Re: [ceph-users] New EC pool undersized In this case, th

Re: [ceph-users] New EC pool undersized

2015-03-04 Thread Don Doerner
In this case, that number means that there is not an OSD that can be assigned. What’s your k, m from you erasure coded pool? You’ll need approximately (14400/(k+m)) PGs, rounded up to the next power of 2… -don- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Kyle Hut

[ceph-users] New EC pool undersized

2015-03-04 Thread Kyle Hutson
Last night I blew away my previous ceph configuration (this environment is pre-production) and have 0.87.1 installed. I've manually edited the crushmap so it down looks like https://dpaste.de/OLEa I currently have 144 OSDs on 8 nodes. After increasing pg_num and pgp_num to a more suitable 1024 (d

Re: [ceph-users] qemu-kvm and cloned rbd image

2015-03-04 Thread Josh Durgin
On 03/02/2015 04:16 AM, koukou73gr wrote: Hello, Today I thought I'd experiment with snapshots and cloning. So I did: rbd import --image-format=2 vm-proto.raw rbd/vm-proto rbd snap create rbd/vm-proto@s1 rbd snap protect rbd/vm-proto@s1 rbd clone rbd/vm-proto@s1 rbd/server And then proceeded

Re: [ceph-users] v0.80.8 and librbd performance

2015-03-04 Thread Josh Durgin
On 03/03/2015 03:28 PM, Ken Dreyer wrote: On 03/03/2015 04:19 PM, Sage Weil wrote: Hi, This is just a heads up that we've identified a performance regression in v0.80.8 from previous firefly releases. A v0.80.9 is working it's way through QA and should be out in a few days. If you haven't upg

[ceph-users] trouble running rest-bench

2015-03-04 Thread Deneau, Tom
With my radosgw set up I can successfully run s3cmd operations and swift command-line tool operations. But when I try rest-bench with the same user and keys that I used with s3cmd rest-bench -t 1 --api-host=localhost:7480 --access-key=xxx --secret=yyy --bucket=buckA write I see the objects

Re: [ceph-users] The project of ceph client file system porting from Linux to AIX

2015-03-04 Thread McNamara, Bradley
I'd like to see a Solaris client. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Dennis Chen Sent: Wednesday, March 04, 2015 2:00 AM To: ceph-devel; ceph-users; Sage Weil; Loic Dachary Subject: [ceph-users] The project of ceph client file syste

Re: [ceph-users] CEPH hardware recommendations and cluster design questions

2015-03-04 Thread Stephen Mercier
To expand upon this, the very nature and existence of Ceph is to replace RAID. The FS itself replicates data and handles the HA functionality that you're looking for. If you're going to build a single server with all those disks, backed by a ZFS RAID setup, you're going to be much better suited

Re: [ceph-users] cephfs filesystem layouts : authentication gotchas ?

2015-03-04 Thread Gregory Farnum
Just to get more specific: the reason you can apparently write stuff to a file when you can't write to the pool it's stored in is because the file data is initially stored in cache. The flush out to RADOS, when it happens, will fail. It would definitely be preferable if there was some way to immed

Re: [ceph-users] CEPH hardware recommendations and cluster design questions

2015-03-04 Thread Sage Weil
On Wed, 4 Mar 2015, Adrian Sevcenco wrote: > Hi! I seen the documentation > http://ceph.com/docs/master/start/hardware-recommendations/ but those > minimum requirements without some recommendations don't tell me much ... > > So, from what i seen for mon and mds any cheap 6 core 16+ gb ram amd > wo

Re: [ceph-users] CEPH hardware recommendations and cluster design questions

2015-03-04 Thread Alexandre DERUMIER
Hi for hardware, inktank have good guides here: http://www.inktank.com/resource/inktank-hardware-selection-guide/ http://www.inktank.com/resource/inktank-hardware-configuration-guide/ ceph works well with multiple osd daemon (1 osd by disk), so you should not use raid. (xfs is the recommended fs

Re: [ceph-users] Ceph Cluster Address

2015-03-04 Thread Gregory Farnum
On Tue, Mar 3, 2015 at 9:26 AM, Garg, Pankaj wrote: > Hi, > > I have ceph cluster that is contained within a rack (1 Monitor and 5 OSD > nodes). I kept the same public and private address for configuration. > > I do have 2 NICS and 2 valid IP addresses (one internal only and one > external) for ea

[ceph-users] CEPH hardware recommendations and cluster design questions

2015-03-04 Thread Adrian Sevcenco
Hi! I seen the documentation http://ceph.com/docs/master/start/hardware-recommendations/ but those minimum requirements without some recommendations don't tell me much ... So, from what i seen for mon and mds any cheap 6 core 16+ gb ram amd would do ... what puzzles me is that "per daemon" constru

Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Robert LeBlanc
If the data have been replicated to new OSDs, it will be able to function properly even them them down or only on the public network. On Wed, Mar 4, 2015 at 9:49 AM, Andrija Panic wrote: > "I guess it doesnt matter, since my Crush Map will still refernce old OSDs, > that are stoped (and cluster r

Re: [ceph-users] v0.93 Hammer release candidate released

2015-03-04 Thread Sage Weil
On Wed, 4 Mar 2015, Thomas Lemarchand wrote: > Thanks to all Ceph developers for the good work ! > > I see some love given to CephFS. When will you consider CephFS to be > production ready ? The key missing piece is fsck (check and repair). That's where our efforts are focused now. I think inf

Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
Thx again - I really appreciatethe help guys ! On 4 March 2015 at 17:51, Robert LeBlanc wrote: > If the data have been replicated to new OSDs, it will be able to > function properly even them them down or only on the public network. > > On Wed, Mar 4, 2015 at 9:49 AM, Andrija Panic > wrote: > >

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-04 Thread Andrija Panic
Hi Robert, I already have this stuff set. CEph is 0.87.0 now... Thanks, will schedule this for weekend, 10G network and 36 OSDs - should move data in less than 8h per my last experineced that was arround8h, but some 1G OSDs were included... Thx! On 4 March 2015 at 17:49, Robert LeBlanc wrote:

Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
"I guess it doesnt matter, since my Crush Map will still refernce old OSDs, that are stoped (and cluster resynced after that) ?" I wanted to say: it doesnt matter (I guess?) that my Crush map is still referencing old OSD nodes that are already stoped. Tired, sorry... On 4 March 2015 at 17:48, And

Re: [ceph-users] Persistent Write Back Cache

2015-03-04 Thread Mark Nelson
On 03/04/2015 05:34 AM, John Spray wrote: On 04/03/2015 08:26, Nick Fisk wrote: To illustrate the difference a proper write back cache can make, I put a 1GB (512mb dirty threshold) flashcache in front of my RBD and tweaked the flush parameters to flush dirty blocks at a large queue depth. The

Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
Thx Wido, I needed this confirmations - thanks! On 4 March 2015 at 17:49, Wido den Hollander wrote: > On 03/04/2015 05:44 PM, Robert LeBlanc wrote: > > If I remember right, someone has done this on a live cluster without > > any issues. I seem to remember that it had a fallback mechanism if the

Re: [ceph-users] Persistent Write Back Cache

2015-03-04 Thread Nick Fisk
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of John Spray Sent: 04 March 2015 11:34 To: Nick Fisk; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Persistent Write Back Cache On 04/03/2015 08:26, Nick Fisk wrote: To illustrate the difference a prope

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-04 Thread Robert LeBlanc
You will most likely have a very high relocation percentage. Backfills always are more impactful on smaller clusters, but "osd max backfills" should be what you need to help reduce the impact. The default is 10, you will want to use 1. I didn't catch which version of Ceph you are running, but I th

Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Wido den Hollander
On 03/04/2015 05:44 PM, Robert LeBlanc wrote: > If I remember right, someone has done this on a live cluster without > any issues. I seem to remember that it had a fallback mechanism if the > OSDs couldn't be reached on the cluster network to contact them on the > public network. You could test it

Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
That was my thought, yes - I found this blog that confirms what you are saying I guess: http://www.sebastien-han.fr/blog/2012/07/29/tip-ceph-public-slash-private-network-configuration/ I will do that... Thx I guess it doesnt matter, since my Crush Map will still refernce old OSDs, that are stoped

Re: [ceph-users] Persistent Write Back Cache

2015-03-04 Thread Sage Weil
Hi Nick, Christian, This is something we've discussed a bit but hasn't made it to the top of the list. I think having a single persistent copy on the client has *some* value, although it's limited because its a single point of failure. The simplest scenario would be to use it as a write-throu

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Alexandre DERUMIER
>>Only writes ;) ok, so maybe some background operations (snap triming, scrubing...). maybe debug_osd=20 , could give you more logs ? - Mail original - De: "Olivier Bonvalet" À: "aderumier" Cc: "ceph-users" Envoyé: Mercredi 4 Mars 2015 16:42:13 Objet: Re: [ceph-users] Perf problem a

Re: [ceph-users] Implement replication network with live cluster

2015-03-04 Thread Robert LeBlanc
If I remember right, someone has done this on a live cluster without any issues. I seem to remember that it had a fallback mechanism if the OSDs couldn't be reached on the cluster network to contact them on the public network. You could test it pretty easily without much impact. Take one OSD that h

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet
Yes, good idea. I was looking the «WBThrottle» feature, but go for logging instead. Le mercredi 04 mars 2015 à 17:10 +0100, Alexandre DERUMIER a écrit : > >>Only writes ;) > > ok, so maybe some background operations (snap triming, scrubing...). > > maybe debug_osd=20 , could give you more log

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet
Only writes ;) Le mercredi 04 mars 2015 à 16:19 +0100, Alexandre DERUMIER a écrit : > >>The change is only on OSD (and not on OSD journal). > > do you see twice iops for read and write ? > > if only read, maybe a read ahead bug could explain this. > > - Mail original - > De: "Olivier

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Alexandre DERUMIER
>>The change is only on OSD (and not on OSD journal). do you see twice iops for read and write ? if only read, maybe a read ahead bug could explain this. - Mail original - De: "Olivier Bonvalet" À: "aderumier" Cc: "ceph-users" Envoyé: Mercredi 4 Mars 2015 15:13:30 Objet: Re: [ceph-u

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet
Ceph health is OK yes. The «firefly-upgrade-cluster-IO.png» graph is about IO stats seen by ceph : there is no change between dumpling and firefly. The change is only on OSD (and not on OSD journal). Le mercredi 04 mars 2015 à 15:05 +0100, Alexandre DERUMIER a écrit : > >>The load problem is per

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Alexandre DERUMIER
>>The load problem is permanent : I have twice IO/s on HDD since firefly. Oh, permanent, that's strange. (If you don't see more traffic coming from clients, I don't understand...) do you see also twice ios/ ops in "ceph -w " stats ? is the ceph health ok ? - Mail original - De: "Oliv

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet
Thanks Alexandre. The load problem is permanent : I have twice IO/s on HDD since firefly. And yes, the problem hang the production at night during snap trimming. I suppose there is a new OSD parameter which change behavior of the journal, or something like that. But didn't find anything about tha

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Alexandre DERUMIER
Hi, maybe this is related ?: http://tracker.ceph.com/issues/9503 "Dumpling: removing many snapshots in a short time makes OSDs go berserk" http://tracker.ceph.com/issues/9487 "dumpling: snaptrimmer causes slow requests while backfilling. osd_snap_trim_sleep not helping" http://lists.opennebula

[ceph-users] Firefly, cephfs issues: different unix rights depending on the client and ls are slow

2015-03-04 Thread Francois Lafont
Hi, I'm trying cepfs and I have some problems. Here is the context: All the nodes (in cluster and the clients) are Ubuntu 14.04 with a 3.16 kernel (after apt-get install linux-generic-lts-utopic && reboot). The cluster: - one server with just one monitor daemon (RAM 2GB) - 2 servers (RAM 24GB) w

Re: [ceph-users] Rbd image's data deletion

2015-03-04 Thread Jason Dillaman
An RBD image is split up into (by default 4MB) objects within the OSDs. When you delete an RBD image, all the objects associated with the image are removed from the OSDs. The objects are not securely erased from the OSDs if that is what you are asking. -- Jason Dillaman Red Hat dilla...@r

Re: [ceph-users] cephfs filesystem layouts : authentication gotchas ?

2015-03-04 Thread SCHAER Frederic
Hi, Many thanks for the explanations. I haven't "used" the "nodcache" option when mounting cephfs, it actually got there by default My mount command is/was : # mount -t ceph 1.2.3.4:6789:/ /mnt -o name=puppet,secretfile=./puppet.secret I don't know what causes this option to be default, maybe

Re: [ceph-users] Persistent Write Back Cache

2015-03-04 Thread John Spray
On 04/03/2015 08:26, Nick Fisk wrote: To illustrate the difference a proper write back cache can make, I put a 1GB (512mb dirty threshold) flashcache in front of my RBD and tweaked the flush parameters to flush dirty blocks at a large queue depth. The same fio test (128k iodepth=1) now runs a

[ceph-users] Implement replication network with live cluster

2015-03-04 Thread Andrija Panic
Hi, I'm having a live cluster with only public network (so no explicit network configuraion in the ceph.conf file) I'm wondering what is the procedure to implement dedicated Replication/Private and Public network. I've read the manual, know how to do it in ceph.conf, but I'm wondering since this

[ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet
Hi, last saturday I upgraded my production cluster from dumpling to emperor (since we were successfully using it on a test cluster). A couple of hours later, we had falling OSD : some of them were marked as down by Ceph, probably because of IO starvation. I marked the cluster in «noout», start dow

Re: [ceph-users] Fail to bring OSD back to cluster

2015-03-04 Thread Sahana
Hi Luke, May be you can set these flags: ceph osd set nodown ceph osd set noout Regards Sahana On Wed, Mar 4, 2015 at 2:32 PM, Luke Kao wrote: > Hello ceph community, > We need some immediate help that our cluster is in a very strange and bad > status after unexpected reboot of many OSD n

[ceph-users] Inkscope packages and blog

2015-03-04 Thread alain.dechorgnat
Hi everyone, I'm proud to announce that DEB and RPM packages for Inkscope V1.1 are available on github (https://github.com/inkscope/inkscope-packaging). Inkscope has also its blog : http://inkscope.blogspot.fr. You will find there how to install Inkscope on debian servers (http://inkscope.blog

[ceph-users] The project of ceph client file system porting from Linux to AIX

2015-03-04 Thread Dennis Chen
Hello, The ceph cluster now can only be used by Linux system AFAICT, so I planed to port the ceph client file system from Linux to AIX as a tiered storage solution in that platform. Below is the source code repository I've done, which is still in progress. 3 important modules: 1. aixker: maintain

Re: [ceph-users] v0.93 Hammer release candidate released

2015-03-04 Thread Thomas Lemarchand
Thanks to all Ceph developers for the good work ! I see some love given to CephFS. When will you consider CephFS to be production ready ? I use CephFS in production since Giant, and apart from the "cache pressure health warning" bug, now resolved, I didn't have a single problem. -- Thomas Lemar

[ceph-users] Fail to bring OSD back to cluster

2015-03-04 Thread Luke Kao
Hello ceph community, We need some immediate help that our cluster is in a very strange and bad status after unexpected reboot of many OSD nodes in a very short time frame. We have a cluster with 195 osd configured on 9 different OSD nodes, original version 0.80.5. After some issue of the datace

Re: [ceph-users] Persistent Write Back Cache

2015-03-04 Thread Nick Fisk
Hi Christian, Yes that's correct, it's on the client side. I don't see this much different to a battery backed Raid controller, if you lose power, the data is in the cache until power resumes when it is flushed. If you are going to have the same RBD accessed by multiple servers/clients then you n

Re: [ceph-users] Persistent Write Back Cache

2015-03-04 Thread Christian Balzer
Hello, If I understand you correctly, you're talking about the rbd cache on the client side. So assume that host or the cache SSD on if fail terminally. The client thinks its sync'ed are on the permanent storage (the actual ceph storage cluster), while they are only present locally. So restart

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-04 Thread Andrija Panic
Thank you Rober - I'm wondering when I do remove total of 7 OSDs from crush map - weather that will cause more than 37% of data moved (80% or whatever) I'm also wondering if the thortling that I applied is fine or not - I will introduce the osd_recovery_delay_start 10sec as Irek said. I'm just wo

[ceph-users] Persistent Write Back Cache

2015-03-04 Thread Nick Fisk
Hi All, Is there anything in the pipeline to add the ability to write the librbd cache to ssd so that it can safely ignore sync requests? I have seen a thread a few years back where Sage was discussing something similar, but I can't find anything more recent discussing it. I've been running