[ceph-users] Re: NVMe ARM cluster: throughput plateaus and initial OSD hotspot

Maged Mokhtar via ceph-users Wed, 25 Feb 2026 01:52:47 -0800

i would try simple things first:

1) disable scrub/deepscrub and re-run the test. if this was the cause,you can re-enable scrub again but lower the scrub load

2) disable pg auto scaling, manually set the pg_num yourself on new pooland re-run test.


/maged


On 25/02/2026 10:47, Lina SADI via ceph-users wrote:

Hello Ceph users,
I am currently running performance investigations on a small ARM-basedCeph cluster (Raspberry Pi + NVMe), and I would really appreciatefeedback from experienced users or developers. We are trying tounderstand a non-trivial behavior that appears systematic rather thanrandom.
Below is a technical summary of our setup and methodology. I amattaching the figures directly so that the detailed interpretation canbe derived visually from the data.
_CLUSTER ARCHITECTURE
_Hardware:

 *

    3× Raspberry Pi storage nodes

 *

    Each node equipped with an identical 512 GB NVMe device

 *

    Nodes interconnected through a 1 Gb/s Ethernet switch

 *

    1 additional Raspberry Pi acting as a client on the same network

Software topology (Ceph Pacific 16.2.15 – default configuration):

 *

    Node 1 (umbriel): MON + MGR + OSD

 *

    Node 2 (elara): MON + MDS + OSD

 *

    Node 3 (kerberos): MON + OSD

 *

    Client node: continuous camera ingestion pipeline writing frames
    into a default CephFS pool .

_EXPERIMENT OBJECTIVE
_Our goal is to establish a stable performance baseline beforeinjecting failure scenarios (e.g., verifying whether images are lostduring Ceph faults). However, even with identical workloads on anempty cluster, the system consistently evolves through severaldistinct performance phases.
_WORKLOAD DESCRIPTION
_The workload consists of a simple 2-hour execution of the cameraprogram (~42 FPS) with a frame size of 512 KB:
Collected information:

 *

    Ceph metrics: cluster health, PG state, and OSD performance
    (commit/apply latency)

 *

    System metrics on all nodes: CPU, memory, disk and network usage,
    as well as temperature and Raspberry Pi throttling flags

 *

    Client metrics: latency and throughput
Metrics were collected in parallel with offline parsing to minimizeperturbation.
_OBSERVATIONS (see attached figures)
_Across multiple executions, performance does not drift gradually butinstead evolves through three clear plateaus affecting client latency,client throughput, OSD commit/apply latency, and network traffic,while the cluster remains HEALTH_OK. CPU usage stays stable, andmemory shows periodic cache-related fluctuations. *The most notablebehavior appears at the disk level*: one NVMe starts with much higherutilization (~95%) while the others remain around ~40%, then activityprogressively converges until all OSDs stabilize at similar levels.DVFS and thermal throttling do not explain the phenomenon, as runningexperiments with throttling already active still produces the samethree phases.
Our main questions are therefore:

 *

    *What internal Ceph mechanism could explain these three
    performance phases?*

 *

    *Why does one OSD initially receive more load before balancing
    occurs?*
Any pointers on relevant metrics, configuration aspects, or knownbehaviors on small ARM-based clusters would be extremely valuable.
Thank you very much for your time and insights.

Screenshot 2026-02-23 164425.png
Screenshot 2026-02-23 164440.pngScreenshot 2026-02-24 112112.png
Screenshot 2026-02-23 164517.png
Screenshot 2026-02-23 164536.png
Screenshot 2026-02-23 164552.png
Screenshot 2026-02-23 164602.png
Screenshot 2026-02-23 164624.png
Screenshot 2026-02-23 164658.png


_______________________________________________
ceph-users mailing list [email protected]
To unsubscribe send an email [email protected]

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: NVMe ARM cluster: throughput plateaus and initial OSD hotspot

Reply via email to