Hi Joachim,
thanks for chiming in.
So the conclusion seems to be (please correct me if I'm wrong), you
can either utilize local reads for RGWs from Tentacle on or for RBD,
but only if you map the rbd devices.
Could anybody deny or confirm this conclusion? It would be of great
help to clarify this.
Thanks!
Eugen
Zitat von Joachim Kraftmayer <joachim.kraftma...@clyso.com>:
Hi Eugen,
rados_replica_read_policy is the option for the rgw usecase.
And the client is the radosgw:
ceph config set client.rgw.<name> rados_replica_read_policy <parameter>
Joachim
joachim.kraftma...@clyso.com
www.clyso.com
Hohenzollernstr. 27, 80801 Munich
Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306
Eugen Block <ebl...@nde.ag> schrieb am Mo., 16. Juni 2025, 16:09:
I just noticed that the options crush_location and read_from_replica
from the rbd man page apparently only apply to rbd mapping options.
That doesn't really help in this case either, the clients are not
mapping any RBDs, so this doesn't seem to do the trick. Maybe the
introduction of rados_replica_read_policy will make those localized
reads available in general.
Zitat von Eugen Block <ebl...@nde.ag>:
> Hi Frédéric,
>
> thanks a lot for looking into that, I appreciate it. Until a year
> ago or so we used custom location hooks for a few OSDs, but not for
> clients (yet).
>
> I haven't tried rbd_read_from_replica_policy yet either, so I wasn't
> aware of the crush_location setting on the client side, but it makes
> sense. But I have difficulties getting it to work. I have a tiny
> single node cluster and added an empty rack (rack1) for this test:
>
> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
> -1 0.03899 root default
> -5 0 rack rack1
> -3 0.03899 host storage
> 0 hdd 0.00999 osd.0 up 1.00000 1.00000
> 1 hdd 0.00999 osd.1 up 1.00000 1.00000
> 2 hdd 0.00999 osd.2 up 1.00000 1.00000
> 3 hdd 0.00999 osd.3 up 1.00000 1.00000
>
> Then added this to ceph.conf of the client:
>
> crush_location = rack:rack1
>
> which gives me a parsing error:
>
> 2025-06-16T08:21:37.405+0000 7f2ff71fa640 -1 warning: crush_location
> 'rack:rack1' does not parse, keeping original crush_location
> {{host=controller01,root=default}}
>
> So I tried this which at least seems to get past the parsing error
> (which is also the documented syntax at [2]):
>
> crush_location = rack=rack1
>
> And this also parses without an error:
>
> crush_location = root=default|rack=rack1
>
> But it doesn't really seem to work as expected, looking into debug
> logs of the mon, it doesn't seem to recognize the location change:
>
> 2025-06-16T09:10:55.785+0000 7f2a8eff8640 10
> mon.storage@0(leader).config refresh_config crush_location for
> remote_host controller01 is {}
>
> I tried a couple of different variations in the client's ceph.conf,
> but to no avail yet.
>
> crush_location = root=default rack=rack1
>
> and some more attempts don't seem to be picked up.
>
> [2]
https://docs.ceph.com/en/reef/rados/operations/crush-map/#crush-location
>
> Zitat von Frédéric Nass <frederic.n...@univ-lorraine.fr>:
>
>> Hi Eugen,
>>
>> After reviewing the code, it doesn't seem to be limited to the
>> official 'stretch' mode. Hopefully devs can confirm that.
>>
>> Now, I'm wondering how rados_replica_read_policy compares to
>> rbd_read_from_replica_policy. Do they work the exact same way, with
>> rados_replica_read_policy being limited to librados clients (e.g.,
>> RGW) while rbd_read_from_replica_policy is limited to RBD clients
>> (krbd, librbd)?
>>
>> In any case, it seems that rados_replica_read_policy = localize
>> might require the same crush_location to be set on the client's
>> side, just like rbd_read_from_replica_policy. See 'man rbd 8' or [1]:
>>
>> crush_location=x - Specify the location of the client in terms of
>> CRUSH hierarchy (since 5.8). This is a set of key-value pairs
>> separated from each other by '|', with keys separated from values
>> by ':'. Note that '|' may need to be quoted or escaped to avoid it
>> being interpreted as a pipe by the shell. The key is the bucket
>> type name (e.g. rack, datacenter or region with default bucket
>> types) and the value is the bucket name. For example, to indicate
>> that the client is local to rack "myrack", data center "mydc" and
>> region "myregion":
>>
>> crush_location=rack:myrack|datacenter:mydc|region:myregion
>>
>> Each key-value pair stands on its own: "myrack" doesn't need to
>> reside in "mydc", which in turn doesn't need to reside in
>> "myregion". The location is not a path to the root of the hierarchy
>> but rather a set of nodes that are matched independently, owning to
>> the fact that bucket names are unique within a CRUSH map.
>> "Multipath" locations are supported, so it is possible to indicate
>> locality for multiple parallel hierarchies:
>>
>> crush_location=rack:myrack1|rack:myrack2|datacenter:mydc
>>
>> If you happen to test rados_replica_read_policy = localize, let us
>> know how it works. ;-)
>>
>> Cheers,
>> Frédéric.
>>
>> [1] https://github.com/ceph/ceph/blob/main/doc/man/8/rbd.rst
>>
>> ----- Le 13 Juin 25, à 10:56, Eugen Block ebl...@nde.ag a écrit :
>>
>>> And a follow-up question:
>>> The description only states:
>>>
>>>> If set to ``localize``, read operations will be sent to the closest
>>>> OSD as determined by the CRUSH map.
>>>
>>> But how does the client determine where the nearest OSD is? Will there
>>> be some sort of score similar to the MON connection score? I'd
>>> appreciate any insights.
>>>
>>> Zitat von Eugen Block <ebl...@nde.ag>:
>>>
>>>> Hi *,
>>>>
>>>> I have a question regarding the upcoming feature to optimize read
>>>> performance [0] by reading from the nearest OSD, especially in a
>>>> stretch cluster across two sites (or more). Anthony pointed me to
>>>> [1], looks like a new config option will be introduced in Tentacle:
>>>>
>>>> rados_replica_read_policy
>>>>
>>>> Will this config option be limited to the "official" stretch mode?
>>>> Or will it be possible to utilize it independent of the cluster
>>>> layout?
>>>>
>>>> Thanks!
>>>> Eugen
>>>>
>>>> [0] https://ceph.io/en/news/blog/2025/stretch-cluuuuuuuuusters-part2/
>>>> [1]
>>>>
https://github.com/ceph/ceph/blob/d28e5fe890016235e302122f955fc910c96f2d43/src/common/options/global.yaml.in#L6504
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io