Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-18 Thread SCHAER Frederic
Wow. Thanks Not very operations friendly though… Wouldn’t it be just OK to pull the disk that we think is the bad one, check the serial number, and if not, just replug and let the udev rules do their job and re-insert the disk in the ceph cluster ? (provided XFS doesn’t freeze for good when we d

[ceph-users] Giant upgrade - stability issues

2014-11-18 Thread Andrei Mikhailovsky
Hello cephers, I need your help and suggestion on what is going on with my cluster. A few weeks ago i've upgraded from Firefly to Giant. I've previously written about having issues with Giant where in two weeks period the cluster's IO froze three times after ceph down-ed two osds. I have in to

Re: [ceph-users] CephFS unresponsive at scale (2M files,

2014-11-18 Thread Thomas Lemarchand
Hi Kevin, There are every (I think) MDS tunables listed on this page with a short description : http://ceph.com/docs/master/cephfs/mds-config-ref/ Can you tell us how your cluster behave after the mds-cache-size change ? What is your MDS ram consumption, before and after ? Thanks ! -- Thomas Le

Re: [ceph-users] incorrect pool size, wrong ruleset?

2014-11-18 Thread houmles
Nobody knows where should be problem? On Wed, Nov 12, 2014 at 10:41:36PM +0100, houmles wrote: > Hi, > > I have 2 hosts with 8 2TB drive in each. > I want to have 2 replicas between both hosts and then 2 replicas between osds > on each host. That way even when I lost one host I still have 2 rep

Re: [ceph-users] incorrect pool size, wrong ruleset?

2014-11-18 Thread houmles
What do you mean by osd level? Pool has size 4 and min_size 1. On Tue, Nov 18, 2014 at 10:32:11AM +, Anand Bhat wrote: > What are the setting for min_size and size at OSD level in your Ceph > configuration ? Looks like size is set to 2 which halves your total storage > as two copies of th

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-18 Thread Nick Fisk
Has anyone tried applying this fix to see if it makes any difference? https://github.com/ceph/ceph/pull/2374 I might be in a position in a few days to build a test cluster to test myself, but was wondering if anyone else has had any luck with it? Nick -Original Message- From: ceph-user

[ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-18 Thread Massimiliano Cuttini
Dear all, i try to install ceph but i get errors: #ceph-deploy install node1 [] [ceph_deploy.install][DEBUG ] Installing stable version *firefly *on cluster ceph hosts node1 [ceph_deploy.install][DEBUG ] Detecting platform for host node1 ... []

[ceph-users] Rados Gateway Replication - Containers not accessible via slave zone !

2014-11-18 Thread Vinod H I
Hi, I am trying to test disaster recovery of rados gateways. I setup a federated architecture for rados gateway as explained in the docs. I am using ceph version - 0.80.7 I have setup only one region, "us", with two zones. "us-west" slave zone having user "us-east" "us-east" master zone ha

[ceph-users] Fwd: Rados Gateway Replication - Containers not accessible via slave zone !

2014-11-18 Thread Vinod H I
Hi, I am trying to test disaster recovery of rados gateways. I setup a federated architecture for rados gateway as explained in the docs. I am using ceph version - 0.80.7 I have setup only one region, "us", with two zones. "us-west" slave zone having user "us-east" "us-east" master zone ha

Re: [ceph-users] OSD commits suicide

2014-11-18 Thread Craig Lewis
That would probably have helped. The XFS deadlocks would only occur when there was relatively little free memory. Kernel 3.18 is supposed to have a fix for that, but I haven't tried it yet. Looking at my actual usage, I don't even need 64k inodes. 64k inodes should make things a bit faster when

Re: [ceph-users] osd crashed while there was no space

2014-11-18 Thread Craig Lewis
You shouldn't let the cluster get so full that losing a few OSDs will make you go toofull. Letting the cluster get to 100% full is such a bad idea that you should make sure it doesn't happen. Ceph is supposed to stop moving data to an OSD once that OSD hits osd_backfill_full_ratio, which default

Re: [ceph-users] OSD commits suicide

2014-11-18 Thread Andrey Korolyov
On Tue, Nov 18, 2014 at 10:04 PM, Craig Lewis wrote: > That would probably have helped. The XFS deadlocks would only occur when > there was relatively little free memory. Kernel 3.18 is supposed to have a > fix for that, but I haven't tried it yet. > > Looking at my actual usage, I don't even ne

Re: [ceph-users] Giant upgrade - stability issues

2014-11-18 Thread Samuel Just
Ok, why is ceph marking osds down? Post your ceph.log from one of the problematic periods. -Sam On Tue, Nov 18, 2014 at 1:35 AM, Andrei Mikhailovsky wrote: > Hello cephers, > > I need your help and suggestion on what is going on with my cluster. A few > weeks ago i've upgraded from Firefly to Gi

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-18 Thread David Moreau Simard
Thanks guys. I looked at http://tracker.ceph.com/issues/8818 and chatted with "dis" on #ceph-devel. I ran a LOT of tests on a LOT of comabination of kernels (sometimes with tunables legacy). I haven't found a magical combination in which the following test does not hang: fio --name=writefile --

[ceph-users] Bonding woes

2014-11-18 Thread Roland Giesler
Hi people, I have two identical servers (both Sun X2100 M2's) that form part of a cluster of 3 machines (other machines will be added later). I want to bond two GB ethernet ports on these, which works perfectly on the one, but not on the other. How can this be? The one machine (named S2) detect

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-18 Thread Nick Fisk
Hi David, Have you tried on a normal replicated pool with no cache? I've seen a number of threads recently where caching is causing various things to block/hang. It would be interesting to see if this still happens without the caching layer, at least it would rule it out. Also is there any sign t

Re: [ceph-users] Stackforge Puppet Module

2014-11-18 Thread Nick Fisk
Hi David, Just to let you know I finally managed to get to the bottom of this. In the repo.pp one of the authors has a non ASCII character in his name, for whatever reason this was tripping up my puppet environment. After removing the following line:- # Author: François Charlier The module pro

Re: [ceph-users] Giant upgrade - stability issues

2014-11-18 Thread Andrei Mikhailovsky
Sam, the logs are rather large in size. Where should I post it to? Thanks - Original Message - From: "Samuel Just" To: "Andrei Mikhailovsky" Cc: ceph-users@lists.ceph.com Sent: Tuesday, 18 November, 2014 7:54:56 PM Subject: Re: [ceph-users] Giant upgrade - stability issues Ok, w

Re: [ceph-users] Stackforge Puppet Module

2014-11-18 Thread David Moreau Simard
Great find Nick. I've discussed it on IRC and it does look like a real issue: https://github.com/enovance/edeploy-roles/blob/master/puppet-master.install#L48-L52 I've pushed the fix for review: https://review.openstack.org/#/c/135421/ -- David Moreau Simard > On Nov 18, 2014, at 3:32 PM, Nick

Re: [ceph-users] Giant upgrade - stability issues

2014-11-18 Thread Samuel Just
pastebin or something, probably. -Sam On Tue, Nov 18, 2014 at 12:34 PM, Andrei Mikhailovsky wrote: > Sam, the logs are rather large in size. Where should I post it to? > > Thanks > > From: "Samuel Just" > To: "Andrei Mikhailovsky" > Cc: ceph-users@lists.ceph.com

Re: [ceph-users] CephFS unresponsive at scale (2M files,

2014-11-18 Thread Kevin Sumner
Hi Thomas, I looked over the mds config reference a bit yesterday, but mds cache size seems to be the most relevant tunable. As suggested, I upped mds-cache-size to 1 million yesterday and started the load generator. During load generation, we’re seeing similar behavior on the filesystem and

[ceph-users] Concurrency in ceph

2014-11-18 Thread hp cre
Hello everyone, I'm new to ceph but been working with proprietary clustered filesystem for quite some time. I almost understand how ceph works, but have a couple of questions which have been asked before here, but i didn't understand the answer. In the closed source world, we use clustered fi

Re: [ceph-users] Concurrency in ceph

2014-11-18 Thread Gregory Farnum
On Tue, Nov 18, 2014 at 1:26 PM, hp cre wrote: > Hello everyone, > > I'm new to ceph but been working with proprietary clustered filesystem for > quite some time. > > I almost understand how ceph works, but have a couple of questions which > have been asked before here, but i didn't understand t

Re: [ceph-users] rados mkpool fails, but not ceph osd pool create

2014-11-18 Thread Gregory Farnum
On Tue, Nov 11, 2014 at 11:43 PM, Gauvain Pocentek wrote: > Hi all, > > I'm facing a problem on a ceph deployment. rados mkpool always fails: > > # rados -n client.admin mkpool test > error creating pool test: (2) No such file or directory > > rados lspool and rmpool commands work just fine, and t

Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-18 Thread Massimiliano Cuttini
I solved by installing EPEL repo on yum. I think that somebody should write down in the documentation that EPEL is mandatory Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto: Dear all, i try to install ceph but i get errors: #ceph-deploy install node1 [] [ce

Re: [ceph-users] Concurrency in ceph

2014-11-18 Thread hp cre
Ok thanks Greg. But what openstack does, AFAIU, is use rbd devices directly, one for each Vm instance, right? And that's how it supports live migrations on KVM, etc.. Right? Openstack and similar cloud frameworks don't need to create vm instances on filesystems, am I correct? On 18 Nov 2014 23

Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-18 Thread Travis Rhoden
Hi Massimiliano, I just recreated this bug myself. Ceph-deploy is supposed to install EPEL automatically on the platforms that need it. I just confirmed that it is not doing so, and will be opening up a bug in the Ceph tracker. I'll paste it here when I do so you can follow it. Thanks for the

Re: [ceph-users] Concurrency in ceph

2014-11-18 Thread Campbell, Bill
I can't speak for OpenStack, but OpenNebula uses Libvirt/QEMU/KVM to access an RBD directly for each virtual instance deployed, live-migration included (as each RBD is in and of itself a separate block device, not file system). I would imagine OpenStack works in a similar fashion. - Origin

Re: [ceph-users] Concurrency in ceph

2014-11-18 Thread Gregory Farnum
On Tue, Nov 18, 2014 at 1:43 PM, hp cre wrote: > Ok thanks Greg. > But what openstack does, AFAIU, is use rbd devices directly, one for each > Vm instance, right? And that's how it supports live migrations on KVM, > etc.. Right? Openstack and similar cloud frameworks don't need to create vm >

Re: [ceph-users] Concurrency in ceph

2014-11-18 Thread hp cre
Yes Openstack also uses libvirt/qemu/kvm, thanks. On 18 Nov 2014 23:50, "Campbell, Bill" wrote: > I can't speak for OpenStack, but OpenNebula uses Libvirt/QEMU/KVM to > access an RBD directly for each virtual instance deployed, live-migration > included (as each RBD is in and of itself a separate

Re: [ceph-users] mds continuously crashing on Firefly

2014-11-18 Thread Gregory Farnum
On Thu, Nov 13, 2014 at 9:34 AM, Lincoln Bryant wrote: > Hi all, > > Just providing an update to this -- I started the mds daemon on a new server > and rebooted a box with a hung CephFS mount (from the first crash) and the > problem seems to have gone away. > > I'm still not sure why the mds was

Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-18 Thread Travis Rhoden
I've captured this at http://tracker.ceph.com/issues/10133 On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden wrote: > Hi Massimiliano, > > I just recreated this bug myself. Ceph-deploy is supposed to install EPEL > automatically on the platforms that need it. I just confirmed that it is > not doi

Re: [ceph-users] Dependency issues in fresh ceph/CentOS 7 install

2014-11-18 Thread Massimiliano Cuttini
Then. ...very good! :) Ok, the next bad thing is that I have installed GIANT on Admin node. However ceph-deploy ignore ADMIN node installation and install FIREFLY. Now i have ceph-deploy of Giant on my ADMIN node and my first OSD node with FIREFLY. It seems to me odd. Is it fine or i should

[ceph-users] Replacing Ceph mons & understanding initial members

2014-11-18 Thread Scottix
We currently have a 3 node system with 3 monitor nodes. I created them in the initial setup and the ceph.conf mon initial members = Ceph200, Ceph201, Ceph202 mon host = 10.10.5.31,10.10.5.32,10.10.5.33 We are in the process of expanding and installing dedicated mon servers. I know I can run: cep

Re: [ceph-users] Log reading/how do I tell what an OSD is trying to connect to

2014-11-18 Thread Gregory Farnum
It's a little strange, but with just the one-sided log it looks as though the OSD is setting up a bunch of connections and then deliberately tearing them down again within second or two (i.e., this is not a direct messenger bug, but it might be an OSD one, or it might be something else). Is it pos

Re: [ceph-users] Unclear about CRUSH map and more than one "step emit" in rule

2014-11-18 Thread Gregory Farnum
On Sun, Nov 16, 2014 at 4:17 PM, Anthony Alba wrote: > The step emit documentation states > > "Outputs the current value and empties the stack. Typically used at > the end of a rule, but may also be used to pick from different trees > in the same rule." > > What use case is there for more than one

Re: [ceph-users] mds cluster degraded

2014-11-18 Thread Gregory Farnum
Hmm, last time we saw this it meant that the MDS log had gotten corrupted somehow and was a little short (in that case due to the OSDs filling up). What do you mean by "rebuilt the OSDs"? -Greg On Mon, Nov 17, 2014 at 12:52 PM, JIten Shah wrote: > After i rebuilt the OSD’s, the MDS went into the

Re: [ceph-users] Giant upgrade - stability issues

2014-11-18 Thread Andrei Mikhailovsky
Sam, Pastebin or similar will not take tens of megabytes worth of logs. If we are talking about debug_ms 10 setting, I've got about 7gb worth of logs generated every half an hour or so. Not really sure what to do with that much data. Anything more constructive? Thanks - Original Message

[ceph-users] Bug or by design?

2014-11-18 Thread Robert LeBlanc
I was going to submit this as a bug, but thought I would put it here for discussion first. I have a feeling that it could be behavior by design. ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) I'm using a cache pool and was playing around with the size and min_size on the pool to see

Re: [ceph-users] Cache tiering and cephfs

2014-11-18 Thread Gregory Farnum
I believe the reason we don't allow you to do this right now is that there was not a good way of coordinating the transition (so that everybody starts routing traffic through the cache pool at the same time), which could lead to data inconsistencies. Looks like the OSDs handle this appropriately no

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-18 Thread David Moreau Simard
Testing without the cache tiering is the next test I want to do when I have time.. When it's hanging, there is no activity at all on the cluster. Nothing in "ceph -w", nothing in "ceph osd pool stats". I'll provide an update when I have a chance to test without tiering. -- David Moreau Simard

Re: [ceph-users] Bug or by design?

2014-11-18 Thread Gregory Farnum
On Tue, Nov 18, 2014 at 3:38 PM, Robert LeBlanc wrote: > I was going to submit this as a bug, but thought I would put it here for > discussion first. I have a feeling that it could be behavior by design. > > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) > > I'm using a cache pool an

Re: [ceph-users] incorrect pool size, wrong ruleset?

2014-11-18 Thread Gregory Farnum
On Wed, Nov 12, 2014 at 1:41 PM, houmles wrote: > Hi, > > I have 2 hosts with 8 2TB drive in each. > I want to have 2 replicas between both hosts and then 2 replicas between osds > on each host. That way even when I lost one host I still have 2 replicas. > > Currently I have this ruleset: > > rul

Re: [ceph-users] Bug or by design?

2014-11-18 Thread Robert LeBlanc
On Nov 18, 2014 4:48 PM, "Gregory Farnum" wrote: > > On Tue, Nov 18, 2014 at 3:38 PM, Robert LeBlanc wrote: > > I was going to submit this as a bug, but thought I would put it here for > > discussion first. I have a feeling that it could be behavior by design. > > > > ceph version 0.87 (c51c8f9d8

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-18 Thread Ramakrishna Nishtala (rnishtal)
Hi Dave Did you say iscsi only? The tracker issue does not say though. I am on giant, with both client and ceph on RHEL 7 and seems to work ok, unless I am missing something here. RBD on baremetal with kmod-rbd and caching disabled. [root@compute4 ~]# time fio --name=writefile --size=100G --

Re: [ceph-users] osd crashed while there was no space

2014-11-18 Thread han vincent
Hmm, the problem is I had not modified any config, all the config is default. as you said, all the IO should be stopped by the configs "mon_osd_full_ration" or "osd_failsafe_full_ration". In my test, when the osd near full, the IO from "rest bench" stopped, but the backfill IO did not stop.

Re: [ceph-users] Log reading/how do I tell what an OSD is trying to connect to

2014-11-18 Thread Scott Laird
I think I just solved at least part of the problem. Because of the somewhat peculiar way that I have Docker configured, docker instances on another system were being assigned my OSD's IP address, running for a couple seconds, and then failing (for unrelated reasons). Effectively, there was somethi