Hi Eugen, rados_replica_read_policy is the option for the rgw usecase. And the client is the radosgw:
ceph config set client.rgw.<name> rados_replica_read_policy <parameter> Joachim joachim.kraftma...@clyso.com www.clyso.com Hohenzollernstr. 27, 80801 Munich Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306 Eugen Block <ebl...@nde.ag> schrieb am Mo., 16. Juni 2025, 16:09: > I just noticed that the options crush_location and read_from_replica > from the rbd man page apparently only apply to rbd mapping options. > That doesn't really help in this case either, the clients are not > mapping any RBDs, so this doesn't seem to do the trick. Maybe the > introduction of rados_replica_read_policy will make those localized > reads available in general. > > Zitat von Eugen Block <ebl...@nde.ag>: > > > Hi Frédéric, > > > > thanks a lot for looking into that, I appreciate it. Until a year > > ago or so we used custom location hooks for a few OSDs, but not for > > clients (yet). > > > > I haven't tried rbd_read_from_replica_policy yet either, so I wasn't > > aware of the crush_location setting on the client side, but it makes > > sense. But I have difficulties getting it to work. I have a tiny > > single node cluster and added an empty rack (rack1) for this test: > > > > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > > -1 0.03899 root default > > -5 0 rack rack1 > > -3 0.03899 host storage > > 0 hdd 0.00999 osd.0 up 1.00000 1.00000 > > 1 hdd 0.00999 osd.1 up 1.00000 1.00000 > > 2 hdd 0.00999 osd.2 up 1.00000 1.00000 > > 3 hdd 0.00999 osd.3 up 1.00000 1.00000 > > > > Then added this to ceph.conf of the client: > > > > crush_location = rack:rack1 > > > > which gives me a parsing error: > > > > 2025-06-16T08:21:37.405+0000 7f2ff71fa640 -1 warning: crush_location > > 'rack:rack1' does not parse, keeping original crush_location > > {{host=controller01,root=default}} > > > > So I tried this which at least seems to get past the parsing error > > (which is also the documented syntax at [2]): > > > > crush_location = rack=rack1 > > > > And this also parses without an error: > > > > crush_location = root=default|rack=rack1 > > > > But it doesn't really seem to work as expected, looking into debug > > logs of the mon, it doesn't seem to recognize the location change: > > > > 2025-06-16T09:10:55.785+0000 7f2a8eff8640 10 > > mon.storage@0(leader).config refresh_config crush_location for > > remote_host controller01 is {} > > > > I tried a couple of different variations in the client's ceph.conf, > > but to no avail yet. > > > > crush_location = root=default rack=rack1 > > > > and some more attempts don't seem to be picked up. > > > > [2] > https://docs.ceph.com/en/reef/rados/operations/crush-map/#crush-location > > > > Zitat von Frédéric Nass <frederic.n...@univ-lorraine.fr>: > > > >> Hi Eugen, > >> > >> After reviewing the code, it doesn't seem to be limited to the > >> official 'stretch' mode. Hopefully devs can confirm that. > >> > >> Now, I'm wondering how rados_replica_read_policy compares to > >> rbd_read_from_replica_policy. Do they work the exact same way, with > >> rados_replica_read_policy being limited to librados clients (e.g., > >> RGW) while rbd_read_from_replica_policy is limited to RBD clients > >> (krbd, librbd)? > >> > >> In any case, it seems that rados_replica_read_policy = localize > >> might require the same crush_location to be set on the client's > >> side, just like rbd_read_from_replica_policy. See 'man rbd 8' or [1]: > >> > >> crush_location=x - Specify the location of the client in terms of > >> CRUSH hierarchy (since 5.8). This is a set of key-value pairs > >> separated from each other by '|', with keys separated from values > >> by ':'. Note that '|' may need to be quoted or escaped to avoid it > >> being interpreted as a pipe by the shell. The key is the bucket > >> type name (e.g. rack, datacenter or region with default bucket > >> types) and the value is the bucket name. For example, to indicate > >> that the client is local to rack "myrack", data center "mydc" and > >> region "myregion": > >> > >> crush_location=rack:myrack|datacenter:mydc|region:myregion > >> > >> Each key-value pair stands on its own: "myrack" doesn't need to > >> reside in "mydc", which in turn doesn't need to reside in > >> "myregion". The location is not a path to the root of the hierarchy > >> but rather a set of nodes that are matched independently, owning to > >> the fact that bucket names are unique within a CRUSH map. > >> "Multipath" locations are supported, so it is possible to indicate > >> locality for multiple parallel hierarchies: > >> > >> crush_location=rack:myrack1|rack:myrack2|datacenter:mydc > >> > >> If you happen to test rados_replica_read_policy = localize, let us > >> know how it works. ;-) > >> > >> Cheers, > >> Frédéric. > >> > >> [1] https://github.com/ceph/ceph/blob/main/doc/man/8/rbd.rst > >> > >> ----- Le 13 Juin 25, à 10:56, Eugen Block ebl...@nde.ag a écrit : > >> > >>> And a follow-up question: > >>> The description only states: > >>> > >>>> If set to ``localize``, read operations will be sent to the closest > >>>> OSD as determined by the CRUSH map. > >>> > >>> But how does the client determine where the nearest OSD is? Will there > >>> be some sort of score similar to the MON connection score? I'd > >>> appreciate any insights. > >>> > >>> Zitat von Eugen Block <ebl...@nde.ag>: > >>> > >>>> Hi *, > >>>> > >>>> I have a question regarding the upcoming feature to optimize read > >>>> performance [0] by reading from the nearest OSD, especially in a > >>>> stretch cluster across two sites (or more). Anthony pointed me to > >>>> [1], looks like a new config option will be introduced in Tentacle: > >>>> > >>>> rados_replica_read_policy > >>>> > >>>> Will this config option be limited to the "official" stretch mode? > >>>> Or will it be possible to utilize it independent of the cluster > >>>> layout? > >>>> > >>>> Thanks! > >>>> Eugen > >>>> > >>>> [0] https://ceph.io/en/news/blog/2025/stretch-cluuuuuuuuusters-part2/ > >>>> [1] > >>>> > https://github.com/ceph/ceph/blob/d28e5fe890016235e302122f955fc910c96f2d43/src/common/options/global.yaml.in#L6504 > >>> > >>> > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@ceph.io > >>> To unsubscribe send an email to ceph-users-le...@ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io