On 5/2/14 05:15 , Fabrizio G. Ventola wrote:
Hello everybody,
I'm making some tests with ceph and its editable cluster map and I'm
trying to define a "rack" layer for its hierarchy in this way:

ceph osd tree:

# id weight type name up/down reweight
-1 0.84 root default
-7 0.28 rack rack1
-2 0.14 host cephosd1-dev
0 0.14 osd.0 up 1
-3 0.14 host cephosd2-dev
1 0.14 osd.1 up 1
-8 0.28 rack rack2
-4 0.14 host cephosd3-dev
2 0.14 osd.2 up 1
-5 0.14 host cephosd4-dev
3 0.14 osd.3 up 1
-9 0.28 rack rack3
-6 0.28 host cephosd5-dev
4 0.28 osd.4 up 1

Those are my pools:
pool 0 'data' rep size 3 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 2545 owner 0
crash_replay_interval 45
pool 1 'metadata' rep size 3 min_size 2 crush_ruleset 1 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 2548 owner 0
pool 2 'rbd' rep size 3 min_size 2 crush_ruleset 2 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 2529 owner 0
pool 4 'pool_01' rep size 3 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 333 pgp_num 333 last_change 2542 owner 0

I configured replica 3 for all pools and min_size 2, thus I'm
expecting when I write new data on ceph-fs (through FUSE) or when I
make a new RBD to see the same amount of data on every rack (3 racks,
3 replicas -> 1 replica per rack). But as you can see the third rack
has just one OSD (the first two have two by the way) and should have
the rack1+rack2 amount of data. Instead it has less data than the
other racks (but more than one single OSD of the first two racks).
Where am I wrong?

Thank you in advance,
Fabrizio
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


You also need to edit the crush rules to tell it to choose a leaf from each rack, instead of the default host. If you run
ceph osd crush dump

You'll see that the rules 0, 1, and 2 are operation chooseleaf_firstn, type host. Those rule numbers are referenced in the pool data's crush_ruleset above.


This should get you started on editing the crush map:
https://ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map

In the rules section of the decompiled map, change your
step chooseleaf firstn 0 type host
to
step chooseleaf firstn 0 type rack


Then compile and set the new crushmap.

A lot of data is going to start moving. This will give you a chance to use your cluster during a heavy recovery operation.


--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to