[ceph-users] Re: Debugging OSD cache thrashing

Eugen Block Sun, 22 Jun 2025 11:56:24 -0700

Maybe you should ask this additionally on the devs mailing list.


Zitat von Hector Martin <mar...@marcan.st>:

On 2025/06/23 0:21, Anthony D'Atri wrote:

DIMMs are cheap.
No DIMMs on Apple Macs.
You’re running virtualized in VMs or containers, with OSDs, mons,mgr, and the constellation of other daemons with resourcesdramatically below recommendations. I’ll speculate that at leastthe HDDs are USB-attached, or perhaps you’re on an old cheese-grater?


No, I'm running on bare metal. It's kind of the project I started a few
years ago and everything: https://asahilinux.org/

Yes, the HDDs are USB-attached, and me running this Ceph workload has
led directly to finding and fixing many years-old Linux kernel USB
driver bugs (affecting all platforms, not just funny ones like this
one), and even discovering others that haven't been tracked down yet but
we're currently debugging. If I hadn't run this "strange" workload,
those bugs would have not been found and fixed.

I've also helped track down and fix broken Ceph stuff in Fedora as part
of all this, but I'm sure you'll say Fedora is also an unsupported
distribution.

In fact, I even found a *GCC 13 regression* that miscompiled Ceph with
this whole experiment:
https://tracker.ceph.com/issues/63867
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359

Should I give up running my "unsupported configuration" and stop finding
fixing all these bugs that affect lots of other configurations and
deployments of Ceph and non-Ceph things?

You might experiment with the values described here:

docs.ceph.com <https://docs.ceph.com/en/latest/rados/configuration/
bluestore-config-ref/>
        
<https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/>

<https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/>


Eg bluestore_cache_size_*

Nonetheless, this behavior is clearly broken.


You’re welcome to submit a PR.  Make no mistake, you’re actively
disregarding recommendations with unrealistic recommendations.


To submit a PR I first need to figure out what's going on, hence my email.

"This is not recommended and performance may suffer" is one thing, "the
cache will thrash to death" is another.

There are countless unsupported configurations. Nobody can predictat that level of precision how an unspecified but decidedlyunderresourced deployment will behave, especially if presented withsnapshots.


In my experience, these kinds of corner case bugs that reproduce on
"unsupported" configurations end up hitting supported configurations
too, just less often. As I said, "it's slower" and "it thrashes to death
after the cache fills up" are two very different things.

In fact, I have a production deployment on x86_64 servers with enough
RAM currently suffering from heavy perf degradation on snaptrim too.
It's on an older Ceph version, which I will upgrade before anything
else, but wouldn't it be funny if it turns out it's the same bug?

Never mind that sometimes "unsupported" configurations are a fact of
life for whatever reason (e.g. during DR). Just because a configuration
is "unsupported" doesn't mean problems it uncovers aren't worth fixing
or looking at.

I don't expect this setup to have ideal performance, but I doexpect it not to have completely broken
cache behavior, which is what is happening.
You’re running a deployment whose parameters are mostly undisclosedbut clearly not even close to supportable recommendations.Graceful degradation cannot be expected. It’s sorta like filling adiesel truck’s tank with cough syrup and expecting it to only be “alittle sluggish”.


Gee, I didn't know tuning a configuration parameter to 60% of its
default setting is like running a diesel truck on cough syrup.

Heck, the docs say below 2GB is not recommended, and between 2 and 4GB
may result in "degraded performance" (I was at 2.4GB):

https://docs.ceph.com/en/latest/start/hardware-recommendations/

If anything strictly below 4GB is completely unsupported and expected to
go into a thrashing tailspin, perhaps that doc should be updated to
state that.

Given how averse you seem to be to even consider improving this use case
I’m not averse to anything, but bear in mind that this is freesoftware that thousands of installations use quite successfully.


Many of which I'm sure are also "unsupported", especially with people
running homelabs like this. Again, just because it's outside production
parameters doesn't mean stuff should be outright broken.

Be the change you want to see in the world.


That's kind of the point here.

Not everyone here is a developer, and nobody owes anyone anything.


Nobody owes anyone fixing anything, but as a developer I generally do
not dismiss performance reports that point to something being
pathologically wrong just because the specs don't match recommended
values. I'm not asking you to fix it, I'm asking not to be brushed away
and told to buy more RAM while I'm trying to figure out what the
underlying bug is.

A virtualized sandbox — which this must be because native macOS isnot a supported platform — is not a use-case, it’s a sandbox and noexpectations whatsoever should be made with respect to performance.


No macOS involved here, no virtualization, no sandboxing, no containers.
This is a bare metal ARM64 machine running Fedora 41. It just happens to
be made by Apple.

I would not be surprised if your systems are swapping, which isdoing to exhibit poor performance for any software.


Two out of three of the systems involved don't even have physical swap
configured (they do have the default 8GB zram Fedora provisions, but
that's obviously not the problem here because there is plenty of free
RAM on those). And as I said I checked systemwide I/O load and saw
nothing significant, so if it were swap thrashing it would have been
evident.

I pointed you at tunables to try. On my own time on Sundaymorning. You’re welcome.


And that is welcome, and it would have been great if it didn't also come
with all the dismissiveness.

and dismissive of my report, I'm not particularly inclined to submit a
PR at this point. If one of the tunables fixes this for me, I'll just
keep the fix to myself and the next person to run into it will have to
figure it out for themselves again. *shrug*
If you find a way to transcend math, more power to you.
Angrily writing that a complex, mature, FREE system is “broken”because it doesn’t perform miracles when abused is folly, likeexpecting coffee to not be hot.


Is there any reason to believe it's not broken besides "I set
osd_memory_target to 2.4G"?

If this repros with the 4GB setting on a node with enough RAM (one of
the three does have as much), then is it broken?

If I do track this down to a demonstrable code bug that might affect any
configuration, then is it broken?

- Hector
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Debugging OSD cache thrashing

Reply via email to