Hi all,
As far as I understand, the monitor stores will grow while not HEALTH_OK as
they hold onto all cluster maps. Is this true for all HEALTH_WARN reasons? Our
cluster recently went into HEALTH_WARN due to a few weeks of backfilling onto
new hardware pushing the monitors data stores over the
ors
> holding onto cluster maps
>
>
>
> On 05/17/2018 04:37 PM, Thomas Byrne - UKRI STFC wrote:
> > Hi all,
> >
> >
> >
> > As far as I understand, the monitor stores will grow while not
> > HEALTH_OK as they hold onto all cluster maps.
ay 17, 2018 at 12:56 PM Thomas Byrne - UKRI STFC
mailto:tom.by...@stfc.ac.uk>> wrote:
That seems like a sane way to do it, thanks for the clarification Wido.
As a follow-up, do you have any feeling as to whether the trimming a
particularly intensive task? We just had a fun afternoon where t
Assuming I understand it correctly:
"pg_upmap_items 6.0 [40,20]" refers to replacing (upmapping?) osd.40 with
osd.20 in the acting set of the placement group '6.0'. Assuming it's a 3
replica PG, the other two OSDs in the set remain unchanged from the CRUSH
calculation.
"pg_upmap_items 6.6 [45,
I recently spent some time looking at this, I believe the 'summary' and
'overall_status' sections are now deprecated. The 'status' and 'checks' fields
are the ones to use now.
The 'status' field gives you the OK/WARN/ERR, but returning the most severe
error condition from the 'checks' section i
> In previous versions of Ceph, I was able to determine which PGs had
> scrub errors, and then a cron.hourly script ran "ceph pg repair" for them,
> provided that they were not already being scrubbed. In Luminous, the bad
> PG is not visible in "ceph --status" anywhere. Should I use something
For what it's worth, I think the behaviour Pardhiv and Bryan are describing is
not quite normal, and sounds similar to something we see on our large luminous
cluster with elderly (created as jewel?) monitors. After large operations which
result in the mon stores growing to 20GB+, leaving the clu
Hi all,
Some bluestore OSDs in our Luminous test cluster have started becoming
unresponsive and booting very slowly.
These OSDs have been used for stress testing for hardware destined for our
production cluster, so have had a number of pools on them with many, many
objects in the past. All
Gregory Farnum
Sent: 24 June 2019 17:30
To: Byrne, Thomas (STFC,RAL,SC)
Cc: ceph-users
Subject: Re: [ceph-users] OSDs taking a long time to boot due to
'clear_temp_objects', even with fresh PGs
On Mon, Jun 24, 2019 at 9:06 AM Thomas Byrne - UKRI STFC
wrote:
>
> Hi all,
>
&g
As a counterpoint, adding large amounts of new hardware in gradually (or more
specifically in a few steps) has a few benefits IMO.
- Being able to pause the operation and confirm the new hardware (and cluster)
is operating as expected. You can identify problems with hardware with OSDs at
10% we
Hi Torben,
> Is it allowed to have the scrub period cross midnight ? eg have start time at
> 22:00 and end time 07:00 next morning.
Yes, I think that's what the way it is mostly used, primarily to reduce the
scrub impact during waking/working hours.
> I assume that if you only configure the on
Hi all,
I'm investigating an issue with our (non-Ceph) caching layers of our large EC
cluster. It seems to be turning users requests for whole objects into lots of
small byte range requests reaching the OSDs, but I'm not sure how inefficient
this behaviour is in reality.
My limited understandi
rnum
> Sent: 09 September 2019 23:25
> To: Byrne, Thomas (STFC,RAL,SC)
> Cc: ceph-users
> Subject: Re: [ceph-users] Help understanding EC object reads
>
> On Thu, Aug 29, 2019 at 4:57 AM Thomas Byrne - UKRI STFC
> wrote:
> >
> > Hi all,
> >
> > I’m investiga
13 matches
Mail list logo