Re: [ceph-users] Question regarding CRUSH algorithm

girish kenkere Thu, 16 Feb 2017 12:45:35 -0800

Thanks David,

Its not quiet what i was looking for. Let me explain my question in more
detail -

This is excerpt from Crush paper, this explains how crush algo running on
each client/osd maps pg to an osd during the write operation[lets assume].

*"Tree buckets are structured as a weighted binary search tree with items
at the leaves. Each interior node knows the total weight of its left and
right subtrees and is labeled according to a fixed strategy (described
below). In order to select an item within a bucket, CRUSH starts at the
root of the tree and calculates the hash of the input key x, replica number
r, the bucket identifier, and the label at the current tree node (initially
the root). The result is compared to the weight ratio of the left and right
subtrees to decide which child node to visit next. This process is repeated
until a leaf node is reached, at which point the associated item in the
bucket is chosen. Only logn hashes and node comparisons are needed to
locate an item.:"*

 My question is along the way the tree structure changes, weights of the
nodes change and some nodes even go away. In that case, how are future
reads lead to pg to same osd mapping? Its not cached anywhere, same algo
runs for every future read - what i am missing is how it picks the same
osd[where data resides] every time. With a modified crush map, won't we end
up with different leaf node if we apply same algo?

Thanks
Girish

On Thu, Feb 16, 2017 at 12:05 PM, David Turner <
david.tur...@storagecraft.com> wrote:

> As a piece to the puzzle, the client always has an up to date osd map
> (which includes the crush map).  If it's out of date, then it has to get a
> new one before it can request to read or write to the cluster.  That way
> the client will never have old information and if you add or remove
> storage, the client will always have the most up to date map to know where
> the current copies of the files are.
>
> This can cause slow downs in your cluster performance if you are updating
> your osdmap frequently, which can be caused by deleting a lot of snapshots
> as an example.
>
> ------------------------------
>
> <https://storagecraft.com> David Turner | Cloud Operations Engineer | 
> StorageCraft
> Technology Corporation <https://storagecraft.com>
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943
> <(385)%20224-2943>
>
> ------------------------------
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
>
> ------------------------------
>
> ------------------------------
> *From:* ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of
> girish kenkere [kngen...@gmail.com]
> *Sent:* Thursday, February 16, 2017 12:43 PM
> *To:* ceph-users@lists.ceph.com
> *Subject:* [ceph-users] Question regarding CRUSH algorithm
>
> Hi, I have a question regarding CRUSH algorithm - please let me know how
> this works. CRUSH paper talks about how given an object we select OSD via
> two mapping - first one is obj to PG and then PG to OSD.
>
> This PG to OSD mapping is something i dont understand. It uses pg#,
> cluster map, and placement rules. How is it guaranteed to return correct
> OSD for future reads after the cluster map/placement rules has changed due
> to nodes coming and out?
>
> Thanks
> Girish
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Question regarding CRUSH algorithm

Reply via email to