[ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

2018-05-17 Thread Thomas Byrne - UKRI STFC
Hi all, As far as I understand, the monitor stores will grow while not HEALTH_OK as they hold onto all cluster maps. Is this true for all HEALTH_WARN reasons? Our cluster recently went into HEALTH_WARN due to a few weeks of backfilling onto new hardware pushing the monitors data stores over the

Re: [ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

2018-05-17 Thread Thomas Byrne - UKRI STFC
ors > holding onto cluster maps > > > > On 05/17/2018 04:37 PM, Thomas Byrne - UKRI STFC wrote: > > Hi all, > > > > > > > > As far as I understand, the monitor stores will grow while not > > HEALTH_OK as they hold onto all cluster maps.

Re: [ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

2018-05-21 Thread Thomas Byrne - UKRI STFC
ay 17, 2018 at 12:56 PM Thomas Byrne - UKRI STFC mailto:tom.by...@stfc.ac.uk>> wrote: That seems like a sane way to do it, thanks for the clarification Wido. As a follow-up, do you have any feeling as to whether the trimming a particularly intensive task? We just had a fun afternoon where t

Re: [ceph-users] Balancing cluster with large disks - 10TB HHD

2019-01-02 Thread Thomas Byrne - UKRI STFC
Assuming I understand it correctly: "pg_upmap_items 6.0 [40,20]" refers to replacing (upmapping?) osd.40 with osd.20 in the acting set of the placement group '6.0'. Assuming it's a 3 replica PG, the other two OSDs in the set remain unchanged from the CRUSH calculation. "pg_upmap_items 6.6 [45,

Re: [ceph-users] ceph health JSON format has changed sync?

2019-01-02 Thread Thomas Byrne - UKRI STFC
I recently spent some time looking at this, I believe the 'summary' and 'overall_status' sections are now deprecated. The 'status' and 'checks' fields are the ones to use now. The 'status' field gives you the OK/WARN/ERR, but returning the most severe error condition from the 'checks' section i

Re: [ceph-users] ceph health JSON format has changed

2019-01-02 Thread Thomas Byrne - UKRI STFC
> In previous versions of Ceph, I was able to determine which PGs had > scrub errors, and then a cron.hourly script ran "ceph pg repair" for them, > provided that they were not already being scrubbed. In Luminous, the bad > PG is not visible in "ceph --status" anywhere. Should I use something

Re: [ceph-users] Is it possible to increase Ceph Mon store?

2019-01-08 Thread Thomas Byrne - UKRI STFC
For what it's worth, I think the behaviour Pardhiv and Bryan are describing is not quite normal, and sounds similar to something we see on our large luminous cluster with elderly (created as jewel?) monitors. After large operations which result in the mon stores growing to 20GB+, leaving the clu

[ceph-users] OSDs taking a long time to boot due to 'clear_temp_objects', even with fresh PGs

2019-06-24 Thread Thomas Byrne - UKRI STFC
Hi all, Some bluestore OSDs in our Luminous test cluster have started becoming unresponsive and booting very slowly. These OSDs have been used for stress testing for hardware destined for our production cluster, so have had a number of pools on them with many, many objects in the past. All

Re: [ceph-users] OSDs taking a long time to boot due to 'clear_temp_objects', even with fresh PGs

2019-06-25 Thread Thomas Byrne - UKRI STFC
Gregory Farnum Sent: 24 June 2019 17:30 To: Byrne, Thomas (STFC,RAL,SC) Cc: ceph-users Subject: Re: [ceph-users] OSDs taking a long time to boot due to 'clear_temp_objects', even with fresh PGs On Mon, Jun 24, 2019 at 9:06 AM Thomas Byrne - UKRI STFC wrote: > > Hi all, > &g

Re: [ceph-users] How to add 100 new OSDs...

2019-07-25 Thread Thomas Byrne - UKRI STFC
As a counterpoint, adding large amounts of new hardware in gradually (or more specifically in a few steps) has a few benefits IMO. - Being able to pause the operation and confirm the new hardware (and cluster) is operating as expected. You can identify problems with hardware with OSDs at 10% we

Re: [ceph-users] Scrub start-time and end-time

2019-08-14 Thread Thomas Byrne - UKRI STFC
Hi Torben, > Is it allowed to have the scrub period cross midnight ? eg have start time at > 22:00 and end time 07:00 next morning. Yes, I think that's what the way it is mostly used, primarily to reduce the scrub impact during waking/working hours. > I assume that if you only configure the on

[ceph-users] Help understanding EC object reads

2019-08-29 Thread Thomas Byrne - UKRI STFC
Hi all, I'm investigating an issue with our (non-Ceph) caching layers of our large EC cluster. It seems to be turning users requests for whole objects into lots of small byte range requests reaching the OSDs, but I'm not sure how inefficient this behaviour is in reality. My limited understandi

Re: [ceph-users] Help understanding EC object reads

2019-09-16 Thread Thomas Byrne - UKRI STFC
rnum > Sent: 09 September 2019 23:25 > To: Byrne, Thomas (STFC,RAL,SC) > Cc: ceph-users > Subject: Re: [ceph-users] Help understanding EC object reads > > On Thu, Aug 29, 2019 at 4:57 AM Thomas Byrne - UKRI STFC > wrote: > > > > Hi all, > > > > I’m investiga