Wow. Thanks
Not very operations friendly though…
Wouldn’t it be just OK to pull the disk that we think is the bad one, check the
serial number, and if not, just replug and let the udev rules do their job and
re-insert the disk in the ceph cluster ?
(provided XFS doesn’t freeze for good when we d
Hello cephers,
I need your help and suggestion on what is going on with my cluster. A few
weeks ago i've upgraded from Firefly to Giant. I've previously written about
having issues with Giant where in two weeks period the cluster's IO froze three
times after ceph down-ed two osds. I have in to
Hi Kevin,
There are every (I think) MDS tunables listed on this page with a short
description : http://ceph.com/docs/master/cephfs/mds-config-ref/
Can you tell us how your cluster behave after the mds-cache-size
change ? What is your MDS ram consumption, before and after ?
Thanks !
--
Thomas Le
Nobody knows where should be problem?
On Wed, Nov 12, 2014 at 10:41:36PM +0100, houmles wrote:
> Hi,
>
> I have 2 hosts with 8 2TB drive in each.
> I want to have 2 replicas between both hosts and then 2 replicas between osds
> on each host. That way even when I lost one host I still have 2 rep
What do you mean by osd level? Pool has size 4 and min_size 1.
On Tue, Nov 18, 2014 at 10:32:11AM +, Anand Bhat wrote:
> What are the setting for min_size and size at OSD level in your Ceph
> configuration ? Looks like size is set to 2 which halves your total storage
> as two copies of th
Has anyone tried applying this fix to see if it makes any difference?
https://github.com/ceph/ceph/pull/2374
I might be in a position in a few days to build a test cluster to test myself,
but was wondering if anyone else has had any luck with it?
Nick
-Original Message-
From: ceph-user
Dear all,
i try to install ceph but i get errors:
#ceph-deploy install node1
[]
[ceph_deploy.install][DEBUG ] Installing stable version *firefly *on
cluster ceph hosts node1
[ceph_deploy.install][DEBUG ] Detecting platform for host node1 ...
[]
Hi,
I am trying to test disaster recovery of rados gateways.
I setup a federated architecture for rados gateway as explained in the docs.
I am using ceph version - 0.80.7
I have setup only one region, "us", with two zones.
"us-west" slave zone having user "us-east"
"us-east" master zone ha
Hi,
I am trying to test disaster recovery of rados gateways.
I setup a federated architecture for rados gateway as explained in the docs.
I am using ceph version - 0.80.7
I have setup only one region, "us", with two zones.
"us-west" slave zone having user "us-east"
"us-east" master zone ha
That would probably have helped. The XFS deadlocks would only occur when
there was relatively little free memory. Kernel 3.18 is supposed to have a
fix for that, but I haven't tried it yet.
Looking at my actual usage, I don't even need 64k inodes. 64k inodes
should make things a bit faster when
You shouldn't let the cluster get so full that losing a few OSDs will make
you go toofull. Letting the cluster get to 100% full is such a bad idea
that you should make sure it doesn't happen.
Ceph is supposed to stop moving data to an OSD once that OSD hits
osd_backfill_full_ratio, which default
On Tue, Nov 18, 2014 at 10:04 PM, Craig Lewis wrote:
> That would probably have helped. The XFS deadlocks would only occur when
> there was relatively little free memory. Kernel 3.18 is supposed to have a
> fix for that, but I haven't tried it yet.
>
> Looking at my actual usage, I don't even ne
Ok, why is ceph marking osds down? Post your ceph.log from one of the
problematic periods.
-Sam
On Tue, Nov 18, 2014 at 1:35 AM, Andrei Mikhailovsky wrote:
> Hello cephers,
>
> I need your help and suggestion on what is going on with my cluster. A few
> weeks ago i've upgraded from Firefly to Gi
Thanks guys. I looked at http://tracker.ceph.com/issues/8818 and chatted with
"dis" on #ceph-devel.
I ran a LOT of tests on a LOT of comabination of kernels (sometimes with
tunables legacy). I haven't found a magical combination in which the following
test does not hang:
fio --name=writefile --
Hi people, I have two identical servers (both Sun X2100 M2's) that form
part of a cluster of 3 machines (other machines will be added later). I
want to bond two GB ethernet ports on these, which works perfectly on the
one, but not on the other.
How can this be?
The one machine (named S2) detect
Hi David,
Have you tried on a normal replicated pool with no cache? I've seen a number
of threads recently where caching is causing various things to block/hang.
It would be interesting to see if this still happens without the caching
layer, at least it would rule it out.
Also is there any sign t
Hi David,
Just to let you know I finally managed to get to the bottom of this.
In the repo.pp one of the authors has a non ASCII character in his name, for
whatever reason this was tripping up my puppet environment. After removing
the following line:-
# Author: François Charlier
The module pro
Sam, the logs are rather large in size. Where should I post it to?
Thanks
- Original Message -
From: "Samuel Just"
To: "Andrei Mikhailovsky"
Cc: ceph-users@lists.ceph.com
Sent: Tuesday, 18 November, 2014 7:54:56 PM
Subject: Re: [ceph-users] Giant upgrade - stability issues
Ok, w
Great find Nick.
I've discussed it on IRC and it does look like a real issue:
https://github.com/enovance/edeploy-roles/blob/master/puppet-master.install#L48-L52
I've pushed the fix for review: https://review.openstack.org/#/c/135421/
--
David Moreau Simard
> On Nov 18, 2014, at 3:32 PM, Nick
pastebin or something, probably.
-Sam
On Tue, Nov 18, 2014 at 12:34 PM, Andrei Mikhailovsky wrote:
> Sam, the logs are rather large in size. Where should I post it to?
>
> Thanks
>
> From: "Samuel Just"
> To: "Andrei Mikhailovsky"
> Cc: ceph-users@lists.ceph.com
Hi Thomas,
I looked over the mds config reference a bit yesterday, but mds cache size
seems to be the most relevant tunable.
As suggested, I upped mds-cache-size to 1 million yesterday and started the
load generator. During load generation, we’re seeing similar behavior on the
filesystem and
Hello everyone,
I'm new to ceph but been working with proprietary clustered filesystem for
quite some time.
I almost understand how ceph works, but have a couple of questions which
have been asked before here, but i didn't understand the answer.
In the closed source world, we use clustered fi
On Tue, Nov 18, 2014 at 1:26 PM, hp cre wrote:
> Hello everyone,
>
> I'm new to ceph but been working with proprietary clustered filesystem for
> quite some time.
>
> I almost understand how ceph works, but have a couple of questions which
> have been asked before here, but i didn't understand t
On Tue, Nov 11, 2014 at 11:43 PM, Gauvain Pocentek
wrote:
> Hi all,
>
> I'm facing a problem on a ceph deployment. rados mkpool always fails:
>
> # rados -n client.admin mkpool test
> error creating pool test: (2) No such file or directory
>
> rados lspool and rmpool commands work just fine, and t
I solved by installing EPEL repo on yum.
I think that somebody should write down in the documentation that EPEL
is mandatory
Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto:
Dear all,
i try to install ceph but i get errors:
#ceph-deploy install node1
[]
[ce
Ok thanks Greg.
But what openstack does, AFAIU, is use rbd devices directly, one for each
Vm instance, right? And that's how it supports live migrations on KVM,
etc.. Right? Openstack and similar cloud frameworks don't need to create vm
instances on filesystems, am I correct?
On 18 Nov 2014 23
Hi Massimiliano,
I just recreated this bug myself. Ceph-deploy is supposed to install EPEL
automatically on the platforms that need it. I just confirmed that it is
not doing so, and will be opening up a bug in the Ceph tracker. I'll paste
it here when I do so you can follow it. Thanks for the
I can't speak for OpenStack, but OpenNebula uses Libvirt/QEMU/KVM to access an
RBD directly for each virtual instance deployed, live-migration included (as
each RBD is in and of itself a separate block device, not file system). I would
imagine OpenStack works in a similar fashion.
- Origin
On Tue, Nov 18, 2014 at 1:43 PM, hp cre wrote:
> Ok thanks Greg.
> But what openstack does, AFAIU, is use rbd devices directly, one for each
> Vm instance, right? And that's how it supports live migrations on KVM,
> etc.. Right? Openstack and similar cloud frameworks don't need to create vm
>
Yes Openstack also uses libvirt/qemu/kvm, thanks.
On 18 Nov 2014 23:50, "Campbell, Bill"
wrote:
> I can't speak for OpenStack, but OpenNebula uses Libvirt/QEMU/KVM to
> access an RBD directly for each virtual instance deployed, live-migration
> included (as each RBD is in and of itself a separate
On Thu, Nov 13, 2014 at 9:34 AM, Lincoln Bryant wrote:
> Hi all,
>
> Just providing an update to this -- I started the mds daemon on a new server
> and rebooted a box with a hung CephFS mount (from the first crash) and the
> problem seems to have gone away.
>
> I'm still not sure why the mds was
I've captured this at http://tracker.ceph.com/issues/10133
On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden wrote:
> Hi Massimiliano,
>
> I just recreated this bug myself. Ceph-deploy is supposed to install EPEL
> automatically on the platforms that need it. I just confirmed that it is
> not doi
Then.
...very good! :)
Ok, the next bad thing is that I have installed GIANT on Admin node.
However ceph-deploy ignore ADMIN node installation and install FIREFLY.
Now i have ceph-deploy of Giant on my ADMIN node and my first OSD node
with FIREFLY.
It seems to me odd. Is it fine or i should
We currently have a 3 node system with 3 monitor nodes. I created them in
the initial setup and the ceph.conf
mon initial members = Ceph200, Ceph201, Ceph202
mon host = 10.10.5.31,10.10.5.32,10.10.5.33
We are in the process of expanding and installing dedicated mon servers.
I know I can run:
cep
It's a little strange, but with just the one-sided log it looks as
though the OSD is setting up a bunch of connections and then
deliberately tearing them down again within second or two (i.e., this
is not a direct messenger bug, but it might be an OSD one, or it might
be something else).
Is it pos
On Sun, Nov 16, 2014 at 4:17 PM, Anthony Alba wrote:
> The step emit documentation states
>
> "Outputs the current value and empties the stack. Typically used at
> the end of a rule, but may also be used to pick from different trees
> in the same rule."
>
> What use case is there for more than one
Hmm, last time we saw this it meant that the MDS log had gotten
corrupted somehow and was a little short (in that case due to the OSDs
filling up). What do you mean by "rebuilt the OSDs"?
-Greg
On Mon, Nov 17, 2014 at 12:52 PM, JIten Shah wrote:
> After i rebuilt the OSD’s, the MDS went into the
Sam,
Pastebin or similar will not take tens of megabytes worth of logs. If we are
talking about debug_ms 10 setting, I've got about 7gb worth of logs generated
every half an hour or so. Not really sure what to do with that much data.
Anything more constructive?
Thanks
- Original Message
I was going to submit this as a bug, but thought I would put it here for
discussion first. I have a feeling that it could be behavior by design.
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
I'm using a cache pool and was playing around with the size and min_size on
the pool to see
I believe the reason we don't allow you to do this right now is that
there was not a good way of coordinating the transition (so that
everybody starts routing traffic through the cache pool at the same
time), which could lead to data inconsistencies. Looks like the OSDs
handle this appropriately no
Testing without the cache tiering is the next test I want to do when I have
time..
When it's hanging, there is no activity at all on the cluster.
Nothing in "ceph -w", nothing in "ceph osd pool stats".
I'll provide an update when I have a chance to test without tiering.
--
David Moreau Simard
On Tue, Nov 18, 2014 at 3:38 PM, Robert LeBlanc wrote:
> I was going to submit this as a bug, but thought I would put it here for
> discussion first. I have a feeling that it could be behavior by design.
>
> ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
>
> I'm using a cache pool an
On Wed, Nov 12, 2014 at 1:41 PM, houmles wrote:
> Hi,
>
> I have 2 hosts with 8 2TB drive in each.
> I want to have 2 replicas between both hosts and then 2 replicas between osds
> on each host. That way even when I lost one host I still have 2 replicas.
>
> Currently I have this ruleset:
>
> rul
On Nov 18, 2014 4:48 PM, "Gregory Farnum" wrote:
>
> On Tue, Nov 18, 2014 at 3:38 PM, Robert LeBlanc
wrote:
> > I was going to submit this as a bug, but thought I would put it here for
> > discussion first. I have a feeling that it could be behavior by design.
> >
> > ceph version 0.87 (c51c8f9d8
Hi Dave
Did you say iscsi only? The tracker issue does not say though.
I am on giant, with both client and ceph on RHEL 7 and seems to work ok, unless
I am missing something here. RBD on baremetal with kmod-rbd and caching
disabled.
[root@compute4 ~]# time fio --name=writefile --size=100G --
Hmm, the problem is I had not modified any config, all the config
is default.
as you said, all the IO should be stopped by the configs
"mon_osd_full_ration" or "osd_failsafe_full_ration". In my test, when
the osd near full, the IO from "rest bench" stopped, but the backfill
IO did not stop.
I think I just solved at least part of the problem.
Because of the somewhat peculiar way that I have Docker configured, docker
instances on another system were being assigned my OSD's IP address,
running for a couple seconds, and then failing (for unrelated reasons).
Effectively, there was somethi
47 matches
Mail list logo