[ceph-users] Re: Use local reads with rados_replica_read_policy

Eugen Block Mon, 16 Jun 2025 02:42:08 -0700

Hi Frédéric,

thanks a lot for looking into that, I appreciate it. Until a year agoor so we used custom location hooks for a few OSDs, but not forclients (yet).

I haven't tried rbd_read_from_replica_policy yet either, so I wasn'taware of the crush_location setting on the client side, but it makessense. But I have difficulties getting it to work. I have a tinysingle node cluster and added an empty rack (rack1) for this test:


ID  CLASS  WEIGHT   TYPE NAME             STATUS  REWEIGHT  PRI-AFF
-1         0.03899  root default
-5               0      rack rack1
-3         0.03899      host storage
 0    hdd  0.00999          osd.0             up   1.00000  1.00000
 1    hdd  0.00999          osd.1             up   1.00000  1.00000
 2    hdd  0.00999          osd.2             up   1.00000  1.00000
 3    hdd  0.00999          osd.3             up   1.00000  1.00000

Then added this to ceph.conf of the client:

crush_location = rack:rack1

which gives me a parsing error:

2025-06-16T08:21:37.405+0000 7f2ff71fa640 -1 warning: crush_location'rack:rack1' does not parse, keeping original crush_location{{host=controller01,root=default}}

So I tried this which at least seems to get past the parsing error(which is also the documented syntax at [2]):


crush_location = rack=rack1

And this also parses without an error:

crush_location = root=default|rack=rack1

But it doesn't really seem to work as expected, looking into debuglogs of the mon, it doesn't seem to recognize the location change:

2025-06-16T09:10:55.785+0000 7f2a8eff8640 10mon.storage@0(leader).config refresh_config crush_location forremote_host controller01 is {}

I tried a couple of different variations in the client's ceph.conf,but to no avail yet.


crush_location = root=default rack=rack1

and some more attempts don't seem to be picked up.

[2] https://docs.ceph.com/en/reef/rados/operations/crush-map/#crush-location

Zitat von Frédéric Nass <frederic.n...@univ-lorraine.fr>:

Hi Eugen,
After reviewing the code, it doesn't seem to be limited to theofficial 'stretch' mode. Hopefully devs can confirm that.
Now, I'm wondering how rados_replica_read_policy compares torbd_read_from_replica_policy. Do they work the exact same way, withrados_replica_read_policy being limited to librados clients (e.g.,RGW) while rbd_read_from_replica_policy is limited to RBD clients(krbd, librbd)?
In any case, it seems that rados_replica_read_policy = localizemight require the same crush_location to be set on the client'sside, just like rbd_read_from_replica_policy. See 'man rbd 8' or [1]:
crush_location=x - Specify the location of the client in terms ofCRUSH hierarchy (since 5.8). This is a set of key-value pairsseparated from each other by '|', with keys separated from values by':'. Note that '|' may need to be quoted or escaped to avoid itbeing interpreted as a pipe by the shell. The key is the bucket typename (e.g. rack, datacenter or region with default bucket types) andthe value is the bucket name. For example, to indicate that theclient is local to rack "myrack", data center "mydc" and region"myregion":
            crush_location=rack:myrack|datacenter:mydc|region:myregion
Each key-value pair stands on its own: "myrack" doesn't need toreside in "mydc", which in turn doesn't need to reside in"myregion". The location is not a path to the root of the hierarchybut rather a set of nodes that are matched independently, owning tothe fact that bucket names are unique within a CRUSH map."Multipath" locations are supported, so it is possible to indicatelocality for multiple parallel hierarchies:
            crush_location=rack:myrack1|rack:myrack2|datacenter:mydc
If you happen to test rados_replica_read_policy = localize, let usknow how it works. ;-)
Cheers,
Frédéric.

[1] https://github.com/ceph/ceph/blob/main/doc/man/8/rbd.rst

----- Le 13 Juin 25, à 10:56, Eugen Block ebl...@nde.ag a écrit :
And a follow-up question:
The description only states:
If set to ``localize``, read operations will be sent to the closest
OSD as determined by the CRUSH map.
But how does the client determine where the nearest OSD is? Will there
be some sort of score similar to the MON connection score? I'd
appreciate any insights.

Zitat von Eugen Block <ebl...@nde.ag>:
Hi *,

I have a question regarding the upcoming feature to optimize read
performance [0] by reading from the nearest OSD, especially in a
stretch cluster across two sites (or more). Anthony pointed me to
[1], looks like a new config option will be introduced in Tentacle:

rados_replica_read_policy

Will this config option be limited to the "official" stretch mode?
Or will it be possible to utilize it independent of the cluster
layout?

Thanks!
Eugen

[0] https://ceph.io/en/news/blog/2025/stretch-cluuuuuuuuusters-part2/
[1]
https://github.com/ceph/ceph/blob/d28e5fe890016235e302122f955fc910c96f2d43/src/common/options/global.yaml.in#L6504
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Use local reads with rados_replica_read_policy

Reply via email to