Thanks Robert for your response. I'm considering giving SAS 600G 15K a try before moving to SSD. It should give ~175 IOPS per disk.
Do you think the performance will be better if i goes with the following setup ? 4x OSD nodes 2x SSD - RAID 1 for OS and Journal 10x 600G SAS 15K - NO Raid Two Replication. According to the IOPS calculation you did for the 4TB. Please clarify is 1100 IOPS will be for the one node and the cluster IOPS =$number_of_nodes x $IOPS_per_node ? If this formula is correct, That's being said the cluster on the 4TB - my current setup should give in total "2200 IOPS" and the new SAS setup should give "3500 IOPS" ? Please correct me if i understand this wrong. Thanks in advance, On Tue, Jan 27, 2015 at 3:30 PM, Robert van Leeuwen < robert.vanleeu...@spilgames.com> wrote: > > I have two ceph nodes with the following specifications > > 2x CEPH - OSD - 2 Replication factor > > Model : SuperMicro X8DT3 > > CPU : Dual intel E5620 > > RAM : 32G > > HDD : 2x 480GB SSD RAID-1 ( OS and Journal ) > > 22x 4TB SATA RAID-10 ( OSD ) > > > > 3x Controllers - CEPH Monitor > > Model : ProLiant DL180 G6 > > CPU : Dual intel E5620 > > RAM : 24G > > > > > > If it's a hardware issue please help finding out an answer for the > following 5 questions. > > 4 TB spinners do not give a lot of IOPS, about 100 random IOPS per disk. > In total it would just be 1100 IOPS: 44 disk times 100 IOPS divide by 2 > for RAID and divide by 2 for replication factor. > There might be a bit of caching on the RAID controller and SSD journal but > worst case you will get just 1100 IOPS. > > > I need around 20TB storage, SuperMicro SC846TQ can get 24 hardisk. > > I may attach 24x 960G SSD - NO Raid - with 3x SuperMicro servers - > replication factor 3. > > > >Or it's better to scale-out and put smaller disks on many servers such ( > HP DL380pG8/2x Intel Xeon E5-2650 ) which can hold 12 hardisk > > And Attach 12x 960G SSD - NO Raid - 6x OSD nodes - replication factor 3. > > An OSD for a SSD can easily eat a whole CPU core so 24 SSDs would be to > much. > More smaller nodes also have the upside off smaller impact when a node > breaks. > You could also look at the Supermicro 2u twin chassis with 2 servers with > 12 disks in 2u. > Note that you will not get near to theoretical native performance of those > combined SSDs (100000+ IOPS) but performance will be good none the less. > There have been a few threads about that here before so look back in the > mail threads to find out more. > > > 2. I'm using Mirantis/Fuel 5 for provisioning and deployment of nodes > > When i attach the new ceph osd nodes to the environment, Will the data > be replicated automatically > > from my current old SuperMicro OSD nodes to the new servers after the > deployment complete ? > Don't know the specifics of Fuel and how it manages the crush map. > Some of the data will end up there but not a copy of all data unless you > specify the new servers as a new failure domain in the crush map. > > > 3. I will use 2x 960G SSD RAID 1 for OS > > Is it recommended put the SSD journal disk as a separate partition on > the same disk of OS ? > If you run with SSDs only I would put the journals together with the data > SSDs. > It makes a lot of sense to have them on seperate SSDs when your data disks > are spinners. > (because of the speed difference and bad random IOPS performance of > spinners.) > > > 4. Is it safe to remove the OLD ceph nodes while i'm currently using 2 > replication factors after adding the new hardware nodes ? > It is probably not safe to just turn them off (as mentioned above it > depend on the crush map failure domain layout) > The safe way would be to follow the documentation on how to remove an OSD: > http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ > This will make sure the data is re-located before the OSD is removed. > > > 5. Do i need RAID 1 for the journal hardisk ? and if not, What will > happen if one of the journal HDD's failed ? > No, it is not required. Both have trade-offs. > Disks that are "behind the journal" will become unavailable when it > happens. > RAID1 will be a bit easier to replace in case of a single SSD failure but > is useless if the 2 SSDs fail at the same time (e.g. due to wear). > JBOD will reduce the write load and wear plus it has less impact when it > does fail. > > > 6. Should i use RAID Level for the drivers on OSD nodes ? or it's better > to go without RAID ? > Without RAID usually makes for better performance. Benchmark your specific > workload to be sure. > In general I would go for 3 replica's and no RAID. > > Cheers, > Robert van Leeuwen > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com