i would try simple things first:
1) disable scrub/deepscrub and re-run the test. if this was the cause,
you can re-enable scrub again but lower the scrub load
2) disable pg auto scaling, manually set the pg_num yourself on new pool
and re-run test.
/maged
On 25/02/2026 10:47, Lina SADI via ceph-users wrote:
Hello Ceph users,
I am currently running performance investigations on a small ARM-based
Ceph cluster (Raspberry Pi + NVMe), and I would really appreciate
feedback from experienced users or developers. We are trying to
understand a non-trivial behavior that appears systematic rather than
random.
Below is a technical summary of our setup and methodology. I am
attaching the figures directly so that the detailed interpretation can
be derived visually from the data.
_CLUSTER ARCHITECTURE
_Hardware:
*
3× Raspberry Pi storage nodes
*
Each node equipped with an identical 512 GB NVMe device
*
Nodes interconnected through a 1 Gb/s Ethernet switch
*
1 additional Raspberry Pi acting as a client on the same network
Software topology (Ceph Pacific 16.2.15 – default configuration):
*
Node 1 (umbriel): MON + MGR + OSD
*
Node 2 (elara): MON + MDS + OSD
*
Node 3 (kerberos): MON + OSD
*
Client node: continuous camera ingestion pipeline writing frames
into a default CephFS pool .
_EXPERIMENT OBJECTIVE
_Our goal is to establish a stable performance baseline before
injecting failure scenarios (e.g., verifying whether images are lost
during Ceph faults). However, even with identical workloads on an
empty cluster, the system consistently evolves through several
distinct performance phases.
_WORKLOAD DESCRIPTION
_The workload consists of a simple 2-hour execution of the camera
program (~42 FPS) with a frame size of 512 KB:
Collected information:
*
Ceph metrics: cluster health, PG state, and OSD performance
(commit/apply latency)
*
System metrics on all nodes: CPU, memory, disk and network usage,
as well as temperature and Raspberry Pi throttling flags
*
Client metrics: latency and throughput
Metrics were collected in parallel with offline parsing to minimize
perturbation.
_OBSERVATIONS (see attached figures)
_Across multiple executions, performance does not drift gradually but
instead evolves through three clear plateaus affecting client latency,
client throughput, OSD commit/apply latency, and network traffic,
while the cluster remains HEALTH_OK. CPU usage stays stable, and
memory shows periodic cache-related fluctuations. *The most notable
behavior appears at the disk level*: one NVMe starts with much higher
utilization (~95%) while the others remain around ~40%, then activity
progressively converges until all OSDs stabilize at similar levels.
DVFS and thermal throttling do not explain the phenomenon, as running
experiments with throttling already active still produces the same
three phases.
Our main questions are therefore:
*
*What internal Ceph mechanism could explain these three
performance phases?*
*
*Why does one OSD initially receive more load before balancing
occurs?*
Any pointers on relevant metrics, configuration aspects, or known
behaviors on small ARM-based clusters would be extremely valuable.
Thank you very much for your time and insights.
Screenshot 2026-02-23 164425.png
Screenshot 2026-02-23 164440.pngScreenshot 2026-02-24 112112.png
Screenshot 2026-02-23 164517.png
Screenshot 2026-02-23 164536.png
Screenshot 2026-02-23 164552.png
Screenshot 2026-02-23 164602.png
Screenshot 2026-02-23 164624.png
Screenshot 2026-02-23 164658.png
_______________________________________________
ceph-users mailing list [email protected]
To unsubscribe send an email [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]