Thanks David, Its not quiet what i was looking for. Let me explain my question in more detail -
This is excerpt from Crush paper, this explains how crush algo running on each client/osd maps pg to an osd during the write operation[lets assume]. *"Tree buckets are structured as a weighted binary search tree with items at the leaves. Each interior node knows the total weight of its left and right subtrees and is labeled according to a fixed strategy (described below). In order to select an item within a bucket, CRUSH starts at the root of the tree and calculates the hash of the input key x, replica number r, the bucket identifier, and the label at the current tree node (initially the root). The result is compared to the weight ratio of the left and right subtrees to decide which child node to visit next. This process is repeated until a leaf node is reached, at which point the associated item in the bucket is chosen. Only logn hashes and node comparisons are needed to locate an item.:"* My question is along the way the tree structure changes, weights of the nodes change and some nodes even go away. In that case, how are future reads lead to pg to same osd mapping? Its not cached anywhere, same algo runs for every future read - what i am missing is how it picks the same osd[where data resides] every time. With a modified crush map, won't we end up with different leaf node if we apply same algo? Thanks Girish On Thu, Feb 16, 2017 at 12:05 PM, David Turner < david.tur...@storagecraft.com> wrote: > As a piece to the puzzle, the client always has an up to date osd map > (which includes the crush map). If it's out of date, then it has to get a > new one before it can request to read or write to the cluster. That way > the client will never have old information and if you add or remove > storage, the client will always have the most up to date map to know where > the current copies of the files are. > > This can cause slow downs in your cluster performance if you are updating > your osdmap frequently, which can be caused by deleting a lot of snapshots > as an example. > > ------------------------------ > > <https://storagecraft.com> David Turner | Cloud Operations Engineer | > StorageCraft > Technology Corporation <https://storagecraft.com> > 380 Data Drive Suite 300 | Draper | Utah | 84020 > Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943 > <(385)%20224-2943> > > ------------------------------ > > If you are not the intended recipient of this message or received it > erroneously, please notify the sender and delete it, together with any > attachments, and be advised that any dissemination or copying of this > message is prohibited. > > ------------------------------ > > ------------------------------ > *From:* ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > girish kenkere [kngen...@gmail.com] > *Sent:* Thursday, February 16, 2017 12:43 PM > *To:* ceph-users@lists.ceph.com > *Subject:* [ceph-users] Question regarding CRUSH algorithm > > Hi, I have a question regarding CRUSH algorithm - please let me know how > this works. CRUSH paper talks about how given an object we select OSD via > two mapping - first one is obj to PG and then PG to OSD. > > This PG to OSD mapping is something i dont understand. It uses pg#, > cluster map, and placement rules. How is it guaranteed to return correct > OSD for future reads after the cluster map/placement rules has changed due > to nodes coming and out? > > Thanks > Girish > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com