Hi Chris,
Assuming that the scrape period for prom is set to 1 minute, you could
simply be racing against the scrape. Usually it's not a good idea to
create range vectors with the same time range as the scrape period.
Given that you're using irate(), you could increase that to [2m] or
higher and s
Hello,
Any chance that these OSDs were deployed with different
bluestore_min_alloc_size settings?
Josh
On Mon, Jul 7, 2025 at 2:39 PM mhnx wrote:
>
> Hello Stefan!
>
> All of my nodes and clients = Octopus 15.2.14
>
> I have 1x RBD pool and 2000x rbd volumes with 100Gb / each
>
>
> This is upma
Hey Brian,
The setting you're looking for is bluefs_buffered_io. This is very
much a YMMV setting, so it's best to test with both modes, but I
usually recommend turning it off for all but omap-intensive workloads
(e.g. RGW index) due to it causing writes to tend to be split up into
smaller pieces.
Hey Andre,
Clients actually have access to more information than just the
crushmap, which includes temporary PG mappings generated when a
backfill is pending, as well as upmap items which override CRUSH's
placement decision. You can see these in "ceph osd dump", for example.
Josh
On Mon, Jan 27,
Hey Reid,
This sounds similar to what we saw in
https://tracker.ceph.com/issues/62256, in case that helps with your
investigation.
Josh
On Mon, Jan 27, 2025 at 8:07 AM Reid Guyett wrote:
>
> Hello,
>
> We are experiencing slowdowns on one of our radosgw clusters. We restart
> the radosgw daemon
Hey Istvan,
> Quick update on this topic, seems to be the solution for us to offline
> compact all osds.
> After that all snaptrimming can finish in an hour rather than a day.
Ah, this might be tombstone accumulation, then. You'd probably benefit
from going to at least latest Pacific, enabling
r
Note that 'norebalance' disables the balancer but doesn't prevent
backfill; you'll want to set 'nobackfill' as well.
Josh
On Sun, Jan 12, 2025 at 1:49 PM Anthony D'Atri wrote:
>
> [ ed: snag during moderation (somehow a newline was interpolated in the
> Subject), so I’m sending this on behalf o
> > FWIW, having encountered these long-startup issues many times in the
> > past on both HDD and QLC OSDs, I can pretty confidently say that
> > throwing flash at the problem doesn't make it go away. Fewer issues
> > with DB IOs contending with client IOs, but flapping can still occur
> > during P
> I'm wondering about the influence of WAL/DBs collocated on HDDs on OSD
> creation time, OSD startup time, peering and osdmap updates, and the role it
> might play regarding flapping, when DB IOs compete with client IOs, even with
> 100% active+clean PGs.
FWIW, having encountered these long-st
I think it was mentioned elsewhere in this thread that there are
limitations to what upmap can do, especially in significant crush map
change situations. It can't violate crush rules (mon-enforced), and if
the same OSD shows up multiple times in a backfill then upmap can't
deal with it.
Creeping b
Hey Janek,
Ah, yes, we ran into that invalid json output in
https://github.com/digitalocean/ceph_exporter as well. I have a patch
I wrote for ceph_exporter that I can port over to pgremapper (that
does similar to what your patch does).
Josh
On Tue, Dec 17, 2024 at 9:38 AM Janek Bevendorff
wrote
Hi Frank,
> Does this setting affect PG removal only or is it affecting other operations
> as well? Essentially: can I leave it at its current value or should I reset
> it to default?
Only PG removal, which is why we set it high enough that it
effectively disables that process.
Josh
__
Ah yes, if you see disk read IOPS going up and up on those draining
OSDs then you might be having issues with older PG deletion logic
interacting poorly with rocksdb tombstones.
Josh
On Thu, Oct 17, 2024 at 8:13 AM Eugen Block wrote:
>
> Hi Frank,
>
> how high is the disk utilization? We see thi
Is this a high-object-count application (S3 or small files in cephfs)?
My guess is that they're going down at the end of PG deletions, where
a rocksdb scan needs to happen. This scan can be really slow and can
exceed heartbeat timeouts, among other things. Some improvements have
been made over majo
We saw this a fair bit in Nautilus, and I also suspected that there
was something up with GC'd and/or deleted objects, but we never
determined the cause. Notably it seemed to happen on PGs ending in
'ff' or 'fff', which was extra suspicious. We haven't seen it since
Pacific.
Josh
On Fri, Oct 4, 2
Ah, yes, that's a good point - if there's backfill going on then
buildup like this can happen.
On Thu, Sep 19, 2024 at 10:08 AM Konstantin Shalygin wrote:
>
> Hi,
>
> On 19 Sep 2024, at 18:26, Joshua Baergen wrote:
>
> Whenever we've seen osdmaps not being tr
Whenever we've seen osdmaps not being trimmed, we've made sure that
any down OSDs are out+destroyed, and then have rolled a restart
through the mons. As of recent Pacific at least this seems to have
reliably gotten us out of this situation.
Josh
On Thu, Sep 19, 2024 at 9:14 AM Igor Fedotov wrote
Hey Frédéric,
> Can I ask what symptoms made you interested in tombstones?
Mostly poor index performance due to slow rocksdb iterators (the cause
being excessive tombstone accumulation).
> Do you think this phenomenon could be related to tombstones? And that
> enabling rocksdb_cf_compact_on_del
Hey Aleksandr,
> In the Pacific we have RocksDB column families. It will be helpful in the
> case of many tombstones to do resharding of our old OSDs?
> Do you think It can help without rocksdb_cf_compact_on_deletion?
> Or, maybe It can help much more with rocksdb_cf_compact_on_deletion?
Ah, I'm
> And my question is: we have regular compaction that does some work. Why It
> doesn't help with tombstones?
> Why only offline compaction can help in our case?
Regular compaction will take care of any tombstones in the files that
end up being compacted, and compaction, when triggered, may even f
enerated in RGW scenario?
>
> We have another option in our version: rocksdb_delete_range_threshold
>
> Do you think it can be helpful?
>
> I think our problem is raised due to massive deletion generated by the
> lifecycle ruleof big bucket.
> On 16.07.2024, 19:25, "Josh
Hello Aleksandr,
What you're probably experiencing is tombstone accumulation, a known
issue for Ceph's use of rocksdb.
> 1. Why can't automatic compaction manage this on its own?
rocksdb compaction is normally triggered by level fullness and not
tombstone counts. However, there is a feature in r
I don't think the change took effect even with
> updating ceph.conf, restart and a direct asok config set. target memory
> value is confirmed to be set via asok config get
>
> Nothing has helped. I still cannot break the 21 MiB/s barrier.
>
> Does anyone have any more idea
It requires an OSD restart, unfortunately.
Josh
On Fri, May 24, 2024 at 11:03 AM Mazzystr wrote:
>
> Is that a setting that can be applied runtime or does it req osd restart?
>
> On Fri, May 24, 2024 at 9:59 AM Joshua Baergen
> wrote:
>
> > Hey Chris,
> >
&
Hey Chris,
A number of users have been reporting issues with recovery on Reef
with mClock. Most folks have had success reverting to
osd_op_queue=wpq. AIUI 18.2.3 should have some mClock improvements but
I haven't looked at the list myself yet.
Josh
On Fri, May 24, 2024 at 10:55 AM Mazzystr wrot
Might appropriate values
> vary by pool type and/or media?
>
>
>
> > On Apr 3, 2024, at 13:38, Joshua Baergen wrote:
> >
> > We've had success using osd_async_recovery_min_cost=0 to drastically
> > reduce slow ops during index recovery.
> >
> &
We've had success using osd_async_recovery_min_cost=0 to drastically
reduce slow ops during index recovery.
Josh
On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham
wrote:
>
> I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
> supports the RGW index pool causes cripplin
I think it depends what you mean by rados objects and s3 objects here. If
you're talking about an object that was uploaded via MPU, and thus may
comprise many rados objects, I don't think there's a difference in read
behaviors based on pool type. If you're talking about reading a subset byte
range
Personally, I don't think the compaction is actually required. Reef
has compact-on-iteration enabled, which should take care of this
automatically. We see this sort of delay pretty often during PG
cleaning, at the end of a PG being cleaned, when the PG has a high
count of objects, whether or not OS
Hi Jaemin,
It is normal for PGs to become degraded during a host reboot, since a
copy of the data was taken offline and needs to be resynchronized
after the host comes back. Normally this is quick, as the recovery
mechanism only needs to modify those objects that have changed while
the host is dow
The balancer will operate on all pools unless otherwise specified.
Josh
On Mon, Mar 4, 2024 at 1:12 PM Cedric wrote:
>
> Did the balancer has enabled pools ? "ceph balancer pool ls"
>
> Actually I am wondering if the balancer do something when no pools are
> added.
>
>
>
> On Mon, Mar 4, 2024, 1
Periodic discard was actually attempted in the past:
https://github.com/ceph/ceph/pull/20723
A proper implementation would probably need appropriate
scheduling/throttling that can be tuned so as to balance against
client I/O impact.
Josh
On Sat, Mar 2, 2024 at 6:20 AM David C. wrote:
>
> Could
32 matches
Mail list logo