Re: [ceph-users] cephfs metadata damage and scrub error

2017-05-30 Thread James Eckersall
1ms later. Any further help is greatly appreciated. On 17 May 2017 at 10:58, James Eckersall wrote: > An update to this. The cluster has been upgraded to Kraken, but I've > still got the same PG reporting inconsistent and the same error message > about mds metadata damaged. > C

Re: [ceph-users] cephfs metadata damage and scrub error

2017-05-17 Thread James Eckersall
o use it? I haven't been able to find any docs that explain. Thanks J On 3 May 2017 at 14:35, James Eckersall wrote: > Hi David, > > Thanks for the reply, it's appreciated. > We're going to upgrade the cluster to Kraken and see if that fixes the > metadata issu

Re: [ceph-users] mds slow requests

2017-05-12 Thread James Eckersall
Hi, no I have not seen any log entries related to scrubs. I see slow requests for various operations including readdir, unlink. Sometimes rdlock, sometimes wrlock. On 12 May 2017 at 16:10, Peter Maloney wrote: > On 05/12/17 16:54, James Eckersall wrote: > > Hi, > > > > We

[ceph-users] mds slow requests

2017-05-12 Thread James Eckersall
Hi, We have an 11 node ceph cluster 8 OSD nodes with 5 disks each and 3 MDS servers. Since upgrading from Jewel to Kraken last week, we are seeing the active MDS constantly reporting a number of slow requests > 30 seconds. The load on the Ceph servers is not excessive. None of the OSD disks appea

Re: [ceph-users] cephfs metadata damage and scrub error

2017-05-03 Thread James Eckersall
etion typically > in a large directory which corresponds to an individual unlink in cephfs. > > If you can build a branch in github to get the newer ceph-osdomap-tool you > could try to use it to repair the omaps. > > David > > > On 5/2/17 5:05 AM, James Eckersall wrote: &g

[ceph-users] cephfs metadata damage and scrub error

2017-05-02 Thread James Eckersall
Hi, I'm having some issues with a ceph cluster. It's an 8 node cluster rnning Jewel ceph-10.2.7-0.el7.x86_64 on CentOS 7. This cluster provides RBDs and a CephFS filesystem to a number of clients. ceph health detail is showing the following errors: pg 2.9 is active+clean+inconsistent, acting [3

Re: [ceph-users] data balancing/crush map issue

2015-11-13 Thread James Eckersall
I've just discovered the hashpspool setting and found that it is set to false on all of my pools. I can't really work out what this setting does though. Can anyone please explain what this setting does and whether it would improve my situation? Thanks J On 11 November 2015 at 14

[ceph-users] data balancing/crush map issue

2015-11-11 Thread James Eckersall
Hi, I have a Ceph cluster running on 0.80.10 and I'm having problems with the data balancing on two new nodes that were recently added. The cluster nodes look like as follows: 6x OSD servers with 32 4TB SAS drives. The drives are configured with RAID0 in pairs, so 16 8TB OSD's per node. New ser

Re: [ceph-users] No auto-mount of OSDs after server reboot

2015-01-30 Thread James Eckersall
I'm running Ubuntu 14.04 servers with Firefly and I don't have a sysvinit file, but I do have an upstart file. "touch /var/lib/ceph/osd/ceph-XX/upstart" should be all you need to do. That way, the OSD's should be mounted automatically on boot. On 30 January 2015 at 10:25, Alexis KOALLA wrote: >

Re: [ceph-users] monitor quorum

2014-09-19 Thread James Eckersall
}, { "rank": 2, "name": "ceph-mon-03", "addr": "10.1.1.66:6789\/0"}]}} I'm really struggling to know what to do now, since even removing this monitor and re-creating it didn't seem to fix the proble

Re: [ceph-users] monitor quorum

2014-09-18 Thread James Eckersall
on.ceph-mon-03@2(electing).elector(947) election timer expired J On 17 September 2014 17:05, James Eckersall wrote: > Hi, > > Now I feel dumb for jumping to the conclusion that it was a simple > networking issue - it isn't. > I've just checked connectivity properly and I can

Re: [ceph-users] monitor quorum

2014-09-17 Thread James Eckersall
her there is something else that can be done to fix this. With hindsight, I would have stopped the mon service before relocating the nic cable, but I expected the mon to survive a short network outage which it doesn't seem to have done :( On 17 September 2014 16:21, James Eckersall wrote

Re: [ceph-users] monitor quorum

2014-09-17 Thread James Eckersall
Hi, Thanks for the advice. I feel pretty dumb as it does indeed look like a simple networking issue. You know how you check things 5 times and miss the most obvious one... J On 17 September 2014 16:04, Florian Haas wrote: > On Wed, Sep 17, 2014 at 1:58 PM, James Eckersall > wrote:

[ceph-users] monitor quorum

2014-09-17 Thread James Eckersall
Hi, I have a ceph cluster running 0.80.1 on Ubuntu 14.04. I have 3 monitors and 4 OSD nodes currently. Everything has been running great up until today where I've got an issue with the monitors. I moved mon03 to a different switchport so it would have temporarily lost connectivity. Since then, t

Re: [ceph-users] ceph cluster expansion

2014-08-13 Thread James Eckersall
13 August 2014 14:06, Christian Balzer wrote: > On Wed, 13 Aug 2014 12:47:22 +0100 James Eckersall wrote: > > > Hi Christian, > > > > We're actually using the following chassis: > > http://rnt.de/en/bf_xxlarge.html > > > Ah yes, one of the Blazeback heritage.

Re: [ceph-users] ceph cluster expansion

2014-08-13 Thread James Eckersall
00 servers backing up mostly web content (millions of small files). J On 13 August 2014 10:28, Christian Balzer wrote: > > Hello, > > On Wed, 13 Aug 2014 09:15:34 +0100 James Eckersall wrote: > > > Hi, > > > > I'm looking for some advice on my ceph cluster.

[ceph-users] ceph cluster expansion

2014-08-13 Thread James Eckersall
Hi, I'm looking for some advice on my ceph cluster. The current setup is as follows: 3 mon servers 4 storage servers with the following spec: 1x Intel Xeon E5-2640 @2.50GHz 6 core (12 with hyperthreading). 64GB DDR3 RAM 2x SSDSC2BB080G4 for OS LSI MegaRAID 9260-16i with the following drives:

Re: [ceph-users] GPF kernel panics

2014-08-04 Thread James Eckersall
=hosting_windows_sharedweb, allow rwx pool=infra_systems, allow rwx pool=hosting_linux_sharedweb, allow rwx pool=test Thanks J On 1 August 2014 01:17, Brad Hubbard wrote: > On 07/31/2014 06:37 PM, James Eckersall wrote: > >> Hi, >> >> The stacktraces are very similar. Here is a

Re: [ceph-users] GPF kernel panics

2014-07-31 Thread James Eckersall
yet - 20 hours ish and counting). Now to figure out the best way to get a 3.14 kernel in Ubuntu Trusty :) On 31 July 2014 10:23, Christian Balzer wrote: > On Thu, 31 Jul 2014 10:13:11 +0100 James Eckersall wrote: > > > Hi, > > > > I thought the limit was in relation to c

Re: [ceph-users] GPF kernel panics

2014-07-31 Thread James Eckersall
roaching the maximum amount of kernel mappings, > which is somewhat shy of 250 in any kernel below 3.14? > > If you can easily upgrade to 3.14 see if that fixes it. > > Christian > > On Thu, 31 Jul 2014 09:37:05 +0100 James Eckersall wrote: > > > Hi, > > > >

Re: [ceph-users] GPF kernel panics

2014-07-31 Thread James Eckersall
cking. Thanks J On 31 July 2014 09:12, Ilya Dryomov wrote: > On Thu, Jul 31, 2014 at 11:44 AM, James Eckersall > wrote: > > Hi, > > > > I've had a fun time with ceph this week. > > We have a cluster with 4 OSD (20 OSD's per) servers, 3 mons and a server >

[ceph-users] GPF kernel panics

2014-07-31 Thread James Eckersall
Hi, I've had a fun time with ceph this week. We have a cluster with 4 OSD (20 OSD's per) servers, 3 mons and a server mapping ~200 rbd's and presenting cifs shares. We're using cephx and the export node has its own cephx auth key. I made a change to the key last week, adding rwx access to anothe

[ceph-users] ceph metrics

2014-07-28 Thread James Eckersall
Hi, I'm trying to understand what a lot of the values mean that are reported by "perf dump" on the ceph admin socket. I have a collectd plugin which sends all of these values to graphite. Does anyone have a cross-reference list that explains what they are in more detail? You can glean so much f

Re: [ceph-users] health_err on osd full

2014-07-18 Thread James Eckersall
Thanks Greg. I appreciate the advice, and very quick replies too :) On 18 July 2014 23:35, Gregory Farnum wrote: > On Fri, Jul 18, 2014 at 3:29 PM, James Eckersall > wrote: > > Thanks Greg. > > > > Can I suggest that the documentation makes this much clearer? It m

Re: [ceph-users] health_err on osd full

2014-07-18 Thread James Eckersall
ily rectified. J -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Gregory Farnum Sent: 18 July 2014 23:25 To: James Eckersall Cc: ceph-users Subject: Re: [ceph-users] health_err on osd full Yes, that's expected behavior. Since the cluster ca

[ceph-users] health_err on osd full

2014-07-18 Thread James Eckersall
Hi, I have a ceph cluster running on 0.80.1 with 80 OSD's. I've had fairly uneven distribution of the data and have been keeping it ticking along with "ceph osd reweight XX 0.x" commands on a few OSD's while I try and increase the pg count of the pools to hopefully better balance the data. Tonig

Re: [ceph-users] logrotate

2014-07-11 Thread James Eckersall
J On 11 July 2014 15:04, Sage Weil wrote: > On Fri, 11 Jul 2014, James Eckersall wrote: > > Upon further investigation, it looks like this part of the ceph logrotate > > script is causing me the problem: > > > > if [ -e "/var/lib/ceph/$daemon/$f/done"

Re: [ceph-users] logrotate

2014-07-11 Thread James Eckersall
hen I don't have a "done" file in the mounted directory for any of my osd's. My mon's all have the done file and logrotate is working fine for those. So my question is, what is the purpose of the "done" file and should I just create one for each of my osd's ?

[ceph-users] logrotate

2014-07-10 Thread James Eckersall
Hi, I've just upgraded a ceph cluster from Ubuntu 12.04 with 0.73.1 to Ubuntu 14.04 with 0.80.1. I've noticed that the log rotation doesn't appear to work correctly. The OSD's are just not logging to the current ceph-osd-X.log file. If I restart the OSD's, they start logging, but then overnight,

[ceph-users] logrotate

2014-07-10 Thread James Eckersall
Hi, I've just upgraded a ceph cluster from Ubuntu 12.04 with 0.72.1 to Ubuntu 14.04 with 0.80.1. I've noticed that the log rotation doesn't appear to work correctly. The OSD's are just not logging to the current ceph-osd-X.log file. If I restart the OSD's or run "service ceph-osd reload id=X", th

Re: [ceph-users] rbd watchers

2014-05-22 Thread James Eckersall
deleted yet. You can see > the snapshots with "rbd snap list ". > > On Tue, May 20, 2014 at 4:26 AM, James Eckersall > wrote: > > Hi, > > > > > > > > I'm having some trouble with an rbd image. I want to rename the current > rbd > > a

[ceph-users] rbd watchers

2014-05-20 Thread James Eckersall
Hi, I'm having some trouble with an rbd image. I want to rename the current rbd and create a new rbd with the same name. I renamed the rbd with rbd mv, but it was still mapped on another node, so rbd mv gave me an error that it was unable to remove the source. I then unmapped the original rb