Re: [ceph-users] Ceph Health Check error ( havent seen before ) [EXT]

2019-07-30 Thread Brent Kennedy
sets of cluster monitors were upgraded to 16.04 though. We plan to move them to CentOS once nautilus is fully in place though. -Brent -Original Message- From: Matthew Vernon Sent: Tuesday, July 30, 2019 5:01 AM To: Brent Kennedy ; 'ceph-users' Subject: Re: [ceph-users] C

Re: [ceph-users] Ceph Health Check error ( havent seen before ) [EXT]

2019-07-30 Thread Matthew Vernon
On 29/07/2019 23:24, Brent Kennedy wrote: Apparently sent my email too quickly.  I had to install python-pip on the mgr nodes and run “pip install requests==2.6.0” to fix the missing module and then reboot all three monitors.  Now the dashboard enables no issue. I'm a bit confused as to why i

Re: [ceph-users] Ceph Health Check error ( havent seen before )

2019-07-29 Thread Brent Kennedy
Apparently sent my email too quickly. I had to install python-pip on the mgr nodes and run "pip install requests==2.6.0" to fix the missing module and then reboot all three monitors. Now the dashboard enables no issue. This is apparently a known Ubuntu issue. ( yet another reason to move t

Re: [ceph-users] ceph health JSON format has changed

2019-01-08 Thread Gregory Farnum
On Fri, Jan 4, 2019 at 1:19 PM Jan Kasprzak wrote: > > Gregory Farnum wrote: > : On Wed, Jan 2, 2019 at 5:12 AM Jan Kasprzak wrote: > : > : > Thomas Byrne - UKRI STFC wrote: > : > : I recently spent some time looking at this, I believe the 'summary' and > : > : 'overall_status' sections are now d

Re: [ceph-users] ceph health JSON format has changed

2019-01-04 Thread Jan Kasprzak
Gregory Farnum wrote: : On Wed, Jan 2, 2019 at 5:12 AM Jan Kasprzak wrote: : : > Thomas Byrne - UKRI STFC wrote: : > : I recently spent some time looking at this, I believe the 'summary' and : > : 'overall_status' sections are now deprecated. The 'status' and 'checks' : > : fields are the ones to

Re: [ceph-users] ceph health JSON format has changed

2019-01-04 Thread Gregory Farnum
On Wed, Jan 2, 2019 at 5:12 AM Jan Kasprzak wrote: > Thomas Byrne - UKRI STFC wrote: > : I recently spent some time looking at this, I believe the 'summary' and > : 'overall_status' sections are now deprecated. The 'status' and 'checks' > : fields are the ones to use now. > > OK, thanks.

Re: [ceph-users] ceph health JSON format has changed

2019-01-02 Thread Thomas Byrne - UKRI STFC
> In previous versions of Ceph, I was able to determine which PGs had > scrub errors, and then a cron.hourly script ran "ceph pg repair" for them, > provided that they were not already being scrubbed. In Luminous, the bad > PG is not visible in "ceph --status" anywhere. Should I use something

Re: [ceph-users] ceph health JSON format has changed sync?

2019-01-02 Thread Konstantin Shalygin
Hello, Ceph users, I am afraid the following question is a FAQ, but I still was not able to find the answer: I use ceph --status --format=json-pretty as a source of CEPH status for my Nagios monitoring. After upgrading to Luminous, I see the following in the JSON output when the cluster

Re: [ceph-users] ceph health JSON format has changed

2019-01-02 Thread Jan Kasprzak
Thomas Byrne - UKRI STFC wrote: : I recently spent some time looking at this, I believe the 'summary' and : 'overall_status' sections are now deprecated. The 'status' and 'checks' : fields are the ones to use now. OK, thanks. : The 'status' field gives you the OK/WARN/ERR, but returning t

Re: [ceph-users] ceph health JSON format has changed sync?

2019-01-02 Thread Thomas Byrne - UKRI STFC
I recently spent some time looking at this, I believe the 'summary' and 'overall_status' sections are now deprecated. The 'status' and 'checks' fields are the ones to use now. The 'status' field gives you the OK/WARN/ERR, but returning the most severe error condition from the 'checks' section i

Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

2018-12-20 Thread Paul Emmerich
Oh, I've seen this bug twice on different clusters with Luminous on EC pools with lots of snapshots in the last few months. Seen it on 12.2.5 and 12.2.10 on CentOS. It's basically a broken object somewhere that kills an OSD and then gets recovered to another OSD which then also dies For us the

Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

2018-12-20 Thread Daniel K
Did you ever get anywhere with this? I have 6 OSDs out of 36 continuously flapping with this error in the logs. Thanks, Dan On Fri, Jun 8, 2018 at 11:10 AM Caspar Smit wrote: > Hi all, > > Maybe this will help: > > The issue is with shards 3,4 and 5 of PG 6.3f: > > LOG's of OSD's 16, 17 & 36

Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

2018-06-08 Thread Caspar Smit
Hi all, Maybe this will help: The issue is with shards 3,4 and 5 of PG 6.3f: LOG's of OSD's 16, 17 & 36 (the ones crashing on startup). *Log OSD.16 (shard 4):* 2018-06-08 08:35:01.727261 7f4c585e3700 -1 bluestore(/var/lib/ceph/osd/ceph-16) _txc_add_transaction error (2) No such file or direct

Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

2018-06-08 Thread Caspar Smit
Hi all, I seem to be hitting these tracker issues: https://tracker.ceph.com/issues/23145 http://tracker.ceph.com/issues/24422 PG's 6.1 and 6.3f are having the issues When i list all PG's of a down OSD with: ceph-objectstore-tool --dry-run --type bluestore --data-path /var/lib/ceph/osd/ceph-17/

Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

2018-06-07 Thread Caspar Smit
Update: I've unset nodown to let it continue but now 4 osd's are down and cannot be brought up again, here's what the lofgfile reads: 2018-06-08 08:35:01.716245 7f4c58de4700 0 log_channel(cluster) log [INF] : 6.e3s0 continuing backfill to osd.37(4) from (10864'911406,11124'921472] 6:c7d71bbd:::r

Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

2018-06-07 Thread Caspar Smit
Well i let it run with flags nodown and it looked like it would finish BUT it all went wrong somewhere: This is now the state: health: HEALTH_ERR nodown flag(s) set 5602396/94833780 objects misplaced (5.908%) Reduced data availability: 143 pgs inactive, 142

Re: [ceph-users] Ceph health warn MDS failing to respond to cache pressure

2017-05-12 Thread Webert de Souza Lima
On Wed, May 10, 2017 at 4:09 AM, gjprabu wrote: > Hi Webert, > > Thanks for your reply , can pls suggest ceph pg value for data and > metadata. I have set 128 for data and 128 for metadata , is this correct > Well I think this has nothing to do with your current problem but the PG number d

Re: [ceph-users] Ceph health warn MDS failing to respond to cache pressure

2017-05-12 Thread José M . Martín
Hi, I'm having the same issues running MDS version 11.2.0 and kernel clients 4.10. Regards Jose El 10/05/17 a las 09:11, gjprabu escribió: > HI John, > > Thanks for you reply , we are using below version for client and > MDS (ceph version 10.2.2) > > Regards > Prabu GJ > > > On W

Re: [ceph-users] Ceph health warn MDS failing to respond to cache pressure

2017-05-10 Thread gjprabu
HI John, Thanks for you reply , we are using below version for client and MDS (ceph version 10.2.2) Regards Prabu GJ On Wed, 10 May 2017 12:29:06 +0530 John Spray wrote On Thu, May 4, 2017 at 7:28 AM, gjprabu wrote: >

Re: [ceph-users] Ceph health warn MDS failing to respond to cache pressure

2017-05-10 Thread gjprabu
Hi Webert, Thanks for your reply , can pls suggest ceph pg value for data and metadata. I have set 128 for data and 128 for metadata , is this correct. Regards Prabu GJ On Thu, 04 May 2017 17:04:38 +0530 Webert de Souza Lima wrote I have fa

Re: [ceph-users] Ceph health warn MDS failing to respond to cache pressure

2017-05-10 Thread John Spray
On Thu, May 4, 2017 at 7:28 AM, gjprabu wrote: > Hi Team, > > We are running cephfs with 5 OSD and 3 Mon and 1 MDS. There is > Heath Warn "failing to respond to cache pressure" . Kindly advise to fix > this issue. This is usually due to buggy old clients, and occasionally due to a buggy

Re: [ceph-users] Ceph health warn MDS failing to respond to cache pressure

2017-05-04 Thread Webert de Souza Lima
I have faced the same problem many times. Usually it doesn't cause anything bad, but I had a 30 min system outage twice because of this. It might be because of the number of inodes on your ceph filesystem. Go to the MDS server and do (supposing your mds server id is intcfs-osd1): ceph daemon mds.

Re: [ceph-users] `ceph health` == HEALTH_GOOD_ENOUGH?

2017-02-20 Thread John Spray
On Mon, Feb 20, 2017 at 6:37 AM, Tim Serong wrote: > Hi All, > > Pretend I'm about to upgrade from one Ceph release to another. I want > to know that the cluster is healthy enough to sanely upgrade (MONs > quorate, no OSDs actually on fire), but don't care about HEALTH_WARN > issues like "too man

Re: [ceph-users] Ceph - Health and Monitoring

2017-01-04 Thread jiajia zhong
actually, what you need is an ceph-common package (ubuntu) which contains /usr/bin/ceph, You have to be sure the command's going to be executed on which host. make sure the keys and ceph.conf are correctly configured on that host. you could just run the commands to make sure the configure's ok. eg

Re: [ceph-users] Ceph - Health and Monitoring

2017-01-04 Thread Jeffrey Ollie
I can definitely recommend Prometheus but I prefer the exporter for Ceph that I wrote :) https://github.com/jcollie/ceph_exporter On Mon, Jan 2, 2017 at 7:55 PM, Craig Chi wrote: > Hello, > > I suggest Prometheus with ceph_exporter > and Grafana

Re: [ceph-users] Ceph - Health and Monitoring

2017-01-04 Thread Andre Forigato
e Forigato" , ceph-users@lists.ceph.com > Enviadas: Segunda-feira, 2 de janeiro de 2017 23:55:21 > Assunto: Re: [ceph-users] Ceph - Health and Monitoring > Hello, > I suggest Prometheus with ceph_exporter and Grafana (UI). It can also monitor > the node's health and any oth

Re: [ceph-users] Ceph - Health and Monitoring

2017-01-02 Thread Craig Chi
Hello, I suggest Prometheus withceph_exporter(https://github.com/digitalocean/ceph_exporter)and Grafana (UI). It can also monitor the node's health and any other services you want. And it has a beautiful UI. Sincerely, Craig Chi On 2017-01-02 21:32, ulem...@polarzone.de wrote: > Hi Andre, I us

Re: [ceph-users] Ceph - Health and Monitoring

2017-01-02 Thread ulembke
Hi Andre, I use check_ceph_dash on top of ceph-dash for this (is an nagios/icinga Plugin). https://github.com/Crapworks/ceph-dash https://github.com/Crapworks/check_ceph_dash ceph-dash provide an simple clear overview as web-dashbord. Udo Am 2017-01-02 12:42, schrieb Andre Forigato: Hello,

Re: [ceph-users] ceph health

2016-07-18 Thread Martin Palma
I assume you installed Ceph using 'ceph-deploy'. I noticed the same thing on CentOS when deploying a cluster for testing... As Wido already noted the OSDs are marked as down & out. From each OSD node you can do a "ceph-disk activate-all" to start the OSDs. On Mon, Jul 18, 2016 at 12:59 PM, Wido d

Re: [ceph-users] ceph health

2016-07-18 Thread Wido den Hollander
> Op 18 juli 2016 om 11:49 schreef Ivan Koortzen : > > > Hi All, > > I quite new to ceph but did a initial setup on these Virtual Machines: > > 1x Ceph admin > 3 x Ceph mons > 3x Ceph OSD's > > each osd has 3x 100GB drives, and 3x 20GB journals > > After initial setup of Ceph and runnin

Re: [ceph-users] ceph health

2016-07-18 Thread Oliver Dzombic
Hi, please show the output of: ceph osd pool ls detail also ceph health detail please. -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:i...@ip-interactive.de Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 934

Re: [ceph-users] CEPH health issues

2016-02-06 Thread Tyler Bishop
You need to get your OSD back online. From: "Jeffrey McDonald" To: ceph-users@lists.ceph.com Sent: Saturday, February 6, 2016 8:18:06 AM Subject: [ceph-users] CEPH health issues Hi, I'm seeing lots of issues with my CEPH installation. The health of the system is degraded and many of th

Re: [ceph-users] ceph health related message

2014-09-22 Thread Sean Sullivan
I had this happen to me as well. Turned out to be a connlimit thing for me. I would check dmesg/kernel log and see if you see any conntrack limit reached connection dropped messages then increase connlimit. Odd as I connected over ssh for this but I can't deny syslog. __

Re: [ceph-users] ceph health related message

2014-09-19 Thread BG
I think you may be hitting a firewall issue on port 6789, I had a similar issue recently. The quick start preflight guide has been updated very recently for information on opening the required ports for firewalld or iptables, see link below: http://ceph.com/docs/master/start/quick-start-preflight/

Re: [ceph-users] Ceph health checkup

2013-10-31 Thread Mike Dawson
Narendra, This is an issue. You really want your cluster to he HEALTH_OK with all PGs active+clean. Some exceptions apply (like scrub / deep-scrub). What do 'ceph health detail' and 'ceph osd tree' show? Thanks, Mike Dawson Co-Founder & Director of Cloud Architecture Cloudapt LLC 6330 East 7

Re: [ceph-users] 'ceph health' Nagios plugin

2013-08-20 Thread Sage Weil
On Tue, 20 Aug 2013, Joao Eduardo Luis wrote: > On 08/20/2013 10:24 AM, Valery Tschopp wrote: > > Hi, > > > > For the ones using Nagios to monitor their ceph cluster, I've written a > > 'ceph health' Nagios plugin: > > > > https://github.com/valerytschopp/ceph-nagios-plugins > > > > The plug

Re: [ceph-users] 'ceph health' Nagios plugin

2013-08-20 Thread Joao Eduardo Luis
On 08/20/2013 10:24 AM, Valery Tschopp wrote: Hi, For the ones using Nagios to monitor their ceph cluster, I've written a 'ceph health' Nagios plugin: https://github.com/valerytschopp/ceph-nagios-plugins The plugin is written in python, and allow to specify a client user id and keyring to

Re: [ceph-users] 'ceph health' Nagios plugin

2013-08-20 Thread Loic Dachary
Hi Valery, Thank you for taking the time to write this plugin :-) Did you consider publishing it in http://exchange.nagios.org/directory/Plugins ? Cheers On 20/08/2013 11:24, Valery Tschopp wrote: > Hi, > > For the ones using Nagios to monitor their ceph cluster, I've written a 'ceph > health