Hi Folks!
For those of you, who are using ceph-dash
(https://github.com/Crapworks/ceph-dash), I've created a Nagios-Plugin,
that uses the json endpoint to monitor your cluster remotely:
* https://github.com/Crapworks/check_ceph_dash
I think this can be easily adopted to use the ceph-rest-api as
Hi all,
after coming back from a long weekend, I found my production cluster in
an error state, mentioning 6 scrub errors and 6 pg's in
active+clean+inconsistent state.
Strange is, that my Prelive-Cluster, running on different Hardware, are
also showing 1 scrub error and 1 inconsisten pg...
pg d
Hi again,
just found the ceph pg repair command :) Now both clusters are OK again.
Anyways, I'm really interested in the caus of the problem.
Regards,
Christian
Am 10.06.2014 10:28, schrieb Christian Eichelmann:
> Hi all,
>
> after coming back from a long weekend, I found my prod
Hi ceph users,
since our cluster had a few inconsistent pgs in the last time, i was
wondering what ceph pg repair does, depending on the replication level.
So I just wanted to check if my assumptions are correct:
Replication 2x
Since the cluster can not decide which version is correct one, it wou
>>> Is there any other tool which can also be used to monitor ceph
>>>> specially for object storage?
>>>>
>>>> Regards
>>>> Pragya Jain
>>>> ___
>>>> ceph-users mailin
I can also confirm that after upgrading to firefly both of our clusters (test
and live) were going from 0 scrub errors each for about 6 Month to about 9-12
per week...
This also makes me kind of nervous, since as far as I know everything "ceph pg
repair" does, is to copy the primary object to al
Hi Ceph-Users,
I have absolutely no idea what is going on on my systems...
Hardware:
45 x 4TB Harddisks
2 x 6 Core CPUs
256GB Memory
When initializing all disks and join them to the cluster, after
approximately 30 OSDs, other osds are crashing. When I try to start them
again I see different kind
Hi,
I am running all commands as root, so there are no limits for the processes.
Regards,
Christian
___
Von: Mariusz Gronczewski [mariusz.gronczew...@efigence.com]
Gesendet: Freitag, 12. September 2014 15:33
An: Christian Eichelmann
Cc: ceph-users
u're hitting,
> but that is the most likely one
>
> Also 45 OSDs with 12 (24 with HT, bleah) CPU cores is pretty ballsy.
> I personally would rather do 4 RAID6 (10 disks, with OSD SSD journals)
> with that kind of case and enjoy the fact that my OSDs never fail. ^o^
>
>
re's an issue here http://tracker.ceph.com/issues/6142 , although it
> doesn't seem to have gotten much traction in terms of informing users.
>
> Regards
> Nathan
>
> On 15/09/2014 7:13 PM, Christian Eichelmann wrote:
>> Hi all,
>>
>> I have no idea why runni
Hi all,
during some failover tests and some configuration tests, we currently
discover a strange phenomenon:
Restarting one of our monitors (5 in sum) triggers about 300 of the
following events:
osd.669 10.76.28.58:6935/149172 failed (20 reports from 20 peers after
22.005858 >= grace 20.00)
s and something is happening when one of the goes down. If I
can provide you any more information to clarify the issue, just tell me
what you need.
Regards,
Christian
Am 03.02.2015 18:10, schrieb Gregory Farnum:
> On Tue, Feb 3, 2015 at 3:38 AM, Christian Eichelmann
> wrote:
>>
d.1202 128.142.23.104:6801/98353 59 :
> [WRN] map e132056 wrongly marked me down
> 2015-01-29 11:29:35.441922 osd.1164 128.142.23.102:6850/22486 25 :
> [WRN] map e132056 wrongly marked me down
The behaviour is exactly the same on our system, to it looks like the
same issue.
We are current runni
nta 14, P. O. Box 405, FIN-02101 Espoo, Finland
> mobile: +358 503 812758
> tel. +358 9 4572001
> fax +358 9 4572302
> http://www.csc.fi/
>
>
>
>
> ___
> cep
where in the docs we can put this to catch more
users? Or maybe a warning issued by the osds themselves or something if
they see limits that are low?
sage
- Karan -
On 09 Mar 2015, at 14:48, Christian Eichelmann
wrote:
Hi Karan,
as you are actually writing in your own book, the p
Hi Ceph-Users!
We currently have a problem where I am not sure if the it has it's cause
in Ceph or something else. First, some information about our ceph-setup:
* ceph version 0.87.1
* 5 MON
* 12 OSD with 60x2TB each
* 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1, Debian
Wheezy)
e may be a fix which might stop this from
> happening.
>
> Nick
>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Christian Eichelmann
>> Sent: 20 April 2015 08:29
>> To: ceph-users@lists.ce
ble. The
>>> RBD client is Kernel based and so there may be a fix which might stop
>>> this from happening.
>>>
>>> Nick
>>>
>>>> -Original Message-
>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
&
mes and never
>> really
>> got to the bottom of it, whereas the same volumes formatted with EXT4 has
>> been running for years without a problem.
>>
>>> -Original Message-
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>&
g, was there anything printed to dmesg ?
> Cheers, Dan
>
> On Mon, Apr 20, 2015 at 9:29 AM, Christian Eichelmann
> wrote:
>> Hi Ceph-Users!
>>
>> We currently have a problem where I am not sure if the it has it's cause
>> in Ceph or something else. First, some
cluster. Is iptables getting in the way ?
>
> Cheers, Dan
>
> On Tue, Apr 21, 2015 at 9:13 AM, Christian Eichelmann
> wrote:
>> Hi Dan,
>>
>> we are alreay back on the kernel module since the same problems were
>> happening with fuse. I had no special ulimit setting
Hi all!
We are experiencing approximately 1 scrub error / inconsistent pg every
two days. As far as I know, to fix this you can issue a "ceph pg
repair", which works fine for us. I have a few qestions regarding the
behavior of the ceph cluster in such a case:
1. After ceph detects the scrub error
've only tested this on an idle cluster, so I don't know how well it
>> will work on an active cluster. Since we issue a deep-scrub, if the PGs
>> of the replicas change during the rsync, it should come up with an
>> error. The idea is to keep rsyncing unti
Hi all,
I am trying to remove several rbd images from the cluster.
Unfortunately, that doesn't work:
$ rbd info foo
rbd image 'foo':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.919443.238e1f29
format: 1
$ rbd rm foo
2015-07-2
at 11:30 AM, Christian Eichelmann
> wrote:
>> Hi all,
>>
>> I am trying to remove several rbd images from the cluster.
>> Unfortunately, that doesn't work:
>>
>> $ rbd info foo
>> rbd image 'foo':
>>
Hi all,
we have a ceph cluster, with currently 360 OSDs in 11 Systems. Last week
we were replacing one OSD System with a new one. During that, we had a
lot of problems with OSDs crashing on all of our systems. But that is
not our current problem.
After we got everything up and running again, we s
nd data to ceph again.
To tell the truth, I guess that will result in the end of our ceph
project (running for already 9 Monthes).
Regards,
Christian
Am 29.12.2014 15:59, schrieb Nico Schottelius:
> Hey Christian,
>
> Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]:
>>
think
> (I'm yet learning ceph) that this will make different pgs for each pool,
> also different OSDs, may be this way you can overcome the issue.
>
> Cheers
> Eneko
>
> On 30/12/14 12:17, Christian Eichelmann wrote:
>> Hi Nico and all others who answered,
&g
able in ceph logs in the new pools image
> format?
>
> On 30/12/14 12:31, Christian Eichelmann wrote:
>> Hi Eneko,
>>
>> I was trying a rbd cp before, but that was haning as well. But I
>> couldn't find out if the source image was causing the hang or the
>
ong Ceph will take to fully recover from
> a disk or host failure by testing it with load. Your setup might not be
> robust if it hasn't the available disk space or the speed needed to
> recover quickly from such a failure.
>
> Lionel
> _
Hi all,
as mentioned last year, our ceph cluster is still broken and unusable.
We are still investigating what has happened and I am taking more deep
looks into the output of ceph pg query.
The problem is that I can find some informations about what some of the
sections mean, but mostly I can on
Hi all,
after our cluster problems with incomplete placementgroups, we've
decided to remove our pools and create new ones. This was going fine in
the beginning. After adding an additional OSD server, we now have 2 PGs
that are stuck in the peering state:
HEALTH_WARN 2 pgs peering; 2 pgs stuck ina
Hi all,
I want to understand what Ceph does if several OSDs are down. First of
our, some words to our Setup:
We have 5 Monitors and 12 OSD Server, each has 60x2TB Disks. These
Servers are spread across 4 racks in our datacenter. Every rack holds 3
OSD Server. We have a replication factor of
Sam
On Tue, Jan 20, 2015 at 9:45 AM, Gregory Farnum wrote:
On Tue, Jan 20, 2015 at 2:40 AM, Christian Eichelmann
wrote:
Hi all,
I want to understand what Ceph does if several OSDs are down. First of our,
some words to our Setup:
We have 5 Monitors and 12 OSD Server, each has 60x2TB Disks.
Hi Ceph User!
I had a look at the "official" collectd fork for ceph, which is quite
outdated and not compatible with the upstream version.
Since this was not an option for us, I've worte a Python Plugin for
Collectd, that gets all the precious informations out of the admin
sockets "perf dump" com
I have written a small and lightweight gui, which can also acts as a json rest
api (for non-interactive monitoring):
https://github.com/Crapworks/ceph-dash
Maybe thats what you searching for.
Regards,
Christian
Von: ceph-users [ceph-users-boun...@lists.ceph.com]
36 matches
Mail list logo