Hi Eugen,
rados_replica_read_policy is the option for the rgw usecase.
And the client is the radosgw:

ceph config set client.rgw.<name> rados_replica_read_policy <parameter>

Joachim

joachim.kraftma...@clyso.com

www.clyso.com

Hohenzollernstr. 27, 80801 Munich

Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306

Eugen Block <ebl...@nde.ag> schrieb am Mo., 16. Juni 2025, 16:09:

> I just noticed that the options crush_location and read_from_replica
> from the rbd man page apparently only apply to rbd mapping options.
> That doesn't really help in this case either, the clients are not
> mapping any RBDs, so this doesn't seem to do the trick. Maybe the
> introduction of rados_replica_read_policy will make those localized
> reads available in general.
>
> Zitat von Eugen Block <ebl...@nde.ag>:
>
> > Hi Frédéric,
> >
> > thanks a lot for looking into that, I appreciate it. Until a year
> > ago or so we used custom location hooks for a few OSDs, but not for
> > clients (yet).
> >
> > I haven't tried rbd_read_from_replica_policy yet either, so I wasn't
> > aware of the crush_location setting on the client side, but it makes
> > sense. But I have difficulties getting it to work. I have a tiny
> > single node cluster and added an empty rack (rack1) for this test:
> >
> > ID  CLASS  WEIGHT   TYPE NAME             STATUS  REWEIGHT  PRI-AFF
> > -1         0.03899  root default
> > -5               0      rack rack1
> > -3         0.03899      host storage
> >  0    hdd  0.00999          osd.0             up   1.00000  1.00000
> >  1    hdd  0.00999          osd.1             up   1.00000  1.00000
> >  2    hdd  0.00999          osd.2             up   1.00000  1.00000
> >  3    hdd  0.00999          osd.3             up   1.00000  1.00000
> >
> > Then added this to ceph.conf of the client:
> >
> > crush_location = rack:rack1
> >
> > which gives me a parsing error:
> >
> > 2025-06-16T08:21:37.405+0000 7f2ff71fa640 -1 warning: crush_location
> > 'rack:rack1' does not parse, keeping original crush_location
> > {{host=controller01,root=default}}
> >
> > So I tried this which at least seems to get past the parsing error
> > (which is also the documented syntax at [2]):
> >
> > crush_location = rack=rack1
> >
> > And this also parses without an error:
> >
> > crush_location = root=default|rack=rack1
> >
> > But it doesn't really seem to work as expected, looking into debug
> > logs of the mon, it doesn't seem to recognize the location change:
> >
> > 2025-06-16T09:10:55.785+0000 7f2a8eff8640 10
> > mon.storage@0(leader).config refresh_config crush_location for
> > remote_host controller01 is {}
> >
> > I tried a couple of different variations in the client's ceph.conf,
> > but to no avail yet.
> >
> > crush_location = root=default rack=rack1
> >
> > and some more attempts don't seem to be picked up.
> >
> > [2]
> https://docs.ceph.com/en/reef/rados/operations/crush-map/#crush-location
> >
> > Zitat von Frédéric Nass <frederic.n...@univ-lorraine.fr>:
> >
> >> Hi Eugen,
> >>
> >> After reviewing the code, it doesn't seem to be limited to the
> >> official 'stretch' mode. Hopefully devs can confirm that.
> >>
> >> Now, I'm wondering how rados_replica_read_policy compares to
> >> rbd_read_from_replica_policy. Do they work the exact same way, with
> >> rados_replica_read_policy being limited to librados clients (e.g.,
> >> RGW) while rbd_read_from_replica_policy is limited to RBD clients
> >> (krbd, librbd)?
> >>
> >> In any case, it seems that rados_replica_read_policy = localize
> >> might require the same crush_location to be set on the client's
> >> side, just like rbd_read_from_replica_policy. See 'man rbd 8' or [1]:
> >>
> >> crush_location=x - Specify the location of the client in terms of
> >> CRUSH hierarchy (since 5.8). This is a set of key-value pairs
> >> separated from each other by '|', with keys separated from values
> >> by ':'. Note that '|' may need to be quoted or escaped to avoid it
> >> being interpreted as a pipe by the shell. The key is the bucket
> >> type name (e.g. rack, datacenter or region with default bucket
> >> types) and the value is the bucket name. For example, to indicate
> >> that the client is local to rack "myrack", data center "mydc" and
> >> region "myregion":
> >>
> >>            crush_location=rack:myrack|datacenter:mydc|region:myregion
> >>
> >> Each key-value pair stands on its own: "myrack" doesn't need to
> >> reside in "mydc", which in turn doesn't need to reside in
> >> "myregion". The location is not a path to the root of the hierarchy
> >> but rather a set of nodes that are matched independently, owning to
> >> the fact that bucket names are unique within a CRUSH map.
> >> "Multipath" locations are supported, so it is possible to indicate
> >> locality for multiple parallel hierarchies:
> >>
> >>            crush_location=rack:myrack1|rack:myrack2|datacenter:mydc
> >>
> >> If you happen to test rados_replica_read_policy = localize, let us
> >> know how it works. ;-)
> >>
> >> Cheers,
> >> Frédéric.
> >>
> >> [1] https://github.com/ceph/ceph/blob/main/doc/man/8/rbd.rst
> >>
> >> ----- Le 13 Juin 25, à 10:56, Eugen Block ebl...@nde.ag a écrit :
> >>
> >>> And a follow-up question:
> >>> The description only states:
> >>>
> >>>> If set to ``localize``, read operations will be sent to the closest
> >>>> OSD as determined by the CRUSH map.
> >>>
> >>> But how does the client determine where the nearest OSD is? Will there
> >>> be some sort of score similar to the MON connection score? I'd
> >>> appreciate any insights.
> >>>
> >>> Zitat von Eugen Block <ebl...@nde.ag>:
> >>>
> >>>> Hi *,
> >>>>
> >>>> I have a question regarding the upcoming feature to optimize read
> >>>> performance [0] by reading from the nearest OSD, especially in a
> >>>> stretch cluster across two sites (or more). Anthony pointed me to
> >>>> [1], looks like a new config option will be introduced in Tentacle:
> >>>>
> >>>> rados_replica_read_policy
> >>>>
> >>>> Will this config option be limited to the "official" stretch mode?
> >>>> Or will it be possible to utilize it independent of the cluster
> >>>> layout?
> >>>>
> >>>> Thanks!
> >>>> Eugen
> >>>>
> >>>> [0] https://ceph.io/en/news/blog/2025/stretch-cluuuuuuuuusters-part2/
> >>>> [1]
> >>>>
> https://github.com/ceph/ceph/blob/d28e5fe890016235e302122f955fc910c96f2d43/src/common/options/global.yaml.in#L6504
> >>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to