Re: [ceph-users] Another OSD Crush question.

Rogier Dikkes Tue, 28 Apr 2015 05:15:46 -0700

Hi Robert, 

Relocating the older hardware to the new racks is also an interesting option. 
Thanks for the suggestion!


Rogier Dikkes
Systeem Programmeur Hadoop & HPC Cloud
SURFsara | Science Park 140 | 1098 XG Amsterdam

> On Apr 23, 2015, at 5:50 PM, Robert LeBlanc <rob...@leblancnet.us> wrote:
> 
> If you force CRUSH to put copies in each rack, then you will be limited by 
> the smallest rack. You can have some sever limitations if you try to keep 
> your copies to two racks (see the thread titles "CRUSH rule for 3 replicas 
> across 2 hosts") for some of my explanation about this.
> 
> If I were you, I would install almost all the new hardware and hold out a few 
> pieces. Get the new hardware up and running, then take down some of the 
> original hardware and relocate it in the other cabinets so that you even out 
> the older lower capacity nodes and new higher capacity nodes in each cabinet. 
> That would give you the best of redundancy and performance (not all PGs would 
> have to have a replica on the potentially slower hardware). This would allow 
> you to have replication level three and able to lose a rack.
> 
> Another options if you have the racks is to spread the new hardware over 3 
> racks instead of 2 so that your cluster is over 4 racks. CRUSH will give a 
> preference to the newer hardware (assuming the CRUSH weights reflect the size 
> of the disk) and you would no longer be limited by the older smaller rack.
> 
> On Thu, Apr 23, 2015 at 3:20 AM, Rogier Dikkes <rogier.dik...@surfsara.nl 
> <mailto:rogier.dik...@surfsara.nl>> wrote:
> Hello all, 
> 
> At this moment we have a scenario where i would like your opinion on. 
> 
> Scenario: 
> Currently we have a ceph environment with 1 rack of hardware, this rack 
> contains a couple of OSD nodes with 4T disks. In a few months time we will 
> deploy 2 more racks with OSD nodes, these nodes have 6T disks and 1 node more 
> per rack. 
> 
> Short overview: 
> rack1: 4T OSD
> rack2: 6T OSD
> rack3: 6T OSD
> 
> At this moment we are playing around with the idea to use the CRUSH map to 
> make ceph 'rack aware' and ensure to have data replicated between racks. 
> However from documentation i gathered i found that when you enforce data 
> replication between buckets then your max storage size will be the lowest 
> bucket value. My understanding: enforce the objects (size=3) to be replicated 
> to 3 racks, the moment the rack with 4T OSD's is full we cannot store data 
> anymore. 
> 
> Is this assumption correct?
> 
> The current idea we play with: 
> 
> - Create 2 rack buckets
> - Create a ruleset to create 2 object replica’s for the 2x 6T buckets
> - Create a ruleset to create 1 object replica over all the hosts.
> 
> This would result in 3 replicas of the object. Where we are sure that 2 
> objects at least are in different racks. In the unlikely event of a rack 
> failure we would have at least 1 or 2 replica’s left.
> 
> Our idea is to have a crush rule with config that looks like: 
> device 0 osd.0
> device 1 osd.1
> device 2 osd.2
> device 3 osd.3
> device 4 osd.4
> device 5 osd.5
> device 6 osd.6
> device 7 osd.7
> device 8 osd.8
> device 9 osd.9
> 
> 
>       host r01-cn01 {
>               id -1
>               alg straw
>               hash 0
>               item osd.0 weight 4.00
>       }
> 
>       host r01-cn02 {
>               id -2
>               alg straw
>               hash 0
>               item osd.1 weight 4.00
>       }
> 
>       host r01-cn03 {
>               id -3
>               alg straw
>               hash 0
>               item osd.3 weight 4.00
>       }
> 
>       host r02-cn04 {
>               id -4
>               alg straw
>               hash 0
>               item osd.4 weight 6.00
>       }
> 
>       host r02-cn05 {
>               id -5
>               alg straw
>               hash 0
>               item osd.5 weight 6.00
>       }
> 
>       host r02-cn06 {
>               id -6
>               alg straw
>               hash 0
>               item osd.6 weight 6.00
>       }
> 
>       host r03-cn07 {
>               id -7
>               alg straw
>               hash 0
>               item osd.7 weight 6.00
>       }
> 
>       host r03-cn08 {
>               id -8
>               alg straw
>               hash 0
>               item osd.8 weight 6.00
>       }
> 
>       host r03-cn09 {
>               id -9
>               alg straw
>               hash 0
>               item osd.9 weight 6.00
>       }
> 
>       rack r02 {
>               id -10
>               alg straw
>               hash 0
>               item r02-cn04 weight 6.00
>               item r02-cn05 weight 6.00
>               item r02-cn06 weight 6.00
>       }      
> 
>       rack r03 {
>               id -11
>               alg straw
>               hash 0
>               item r03-cn07 weight 6.00
>               item r03-cn08 weight 6.00
>               item r03-cn09 weight 6.00
>       }
> 
>       root 6t {
>               id -12
>               alg straw
>               hash 0
>               item r02 weight 18.00
>               item r03 weight 18.00
>       }
> 
>       rule one {
>               ruleset 1
>               type replicated
>               min_size 1
>               max_size 10
>               step take 6t
>               step chooseleaf firstn 2 type rack
>               step chooseleaf firstn 1 type host
>               step emit
>       }
> Is this the right approach and would this cause limitations in regards of 
> performance or usability? Do you have suggestions? 
> 
> Another interesting situation we have now is: We are going to move the 
> hardware to new locations next year, the rack layout will change and thus the 
> crush map will be altered. When changing a CRUSH map that theoretically would 
> change the 2x 6T racks into 4 racks, would we need to take any special 
> actions into consideration?
> 
> Thank you for your answers, they are much appreciated! 
> 
> Rogier Dikkes
> System Programmer Hadoop & HPC Cloud
> SURFsara | Science Park 140 | 1098 XG Amsterdam
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Another OSD Crush question.

Reply via email to