Thanks Robert for your response. I'm considering giving SAS 600G 15K a try
before moving to SSD. It should give ~175 IOPS per disk.

Do you think the performance will be better if i goes with the following
setup ?
4x OSD nodes
2x SSD - RAID 1 for OS and Journal
10x 600G SAS 15K - NO Raid
Two Replication.

According to the IOPS calculation you did for the 4TB. Please clarify is
1100 IOPS will be for the one node and the cluster IOPS =$number_of_nodes x
$IOPS_per_node ?

If this formula is correct, That's being said the cluster on the 4TB - my
current setup should give in total "2200 IOPS" and the new SAS setup should
give "3500 IOPS" ?

Please correct me if i understand this wrong.

Thanks in advance,

On Tue, Jan 27, 2015 at 3:30 PM, Robert van Leeuwen <
robert.vanleeu...@spilgames.com> wrote:

> > I have two ceph nodes with the following specifications
> > 2x CEPH - OSD - 2 Replication factor
> > Model : SuperMicro X8DT3
> > CPU : Dual intel E5620
> > RAM : 32G
> > HDD : 2x 480GB SSD RAID-1 ( OS and Journal )
> >      22x 4TB SATA RAID-10 ( OSD )
> >
> > 3x Controllers - CEPH Monitor
> > Model : ProLiant DL180 G6
> > CPU : Dual intel E5620
> > RAM : 24G
> >
> >
> > If it's a hardware issue please help finding out an answer for the
> following 5 questions.
>
> 4 TB spinners do not give a lot of IOPS, about 100 random IOPS per disk.
> In total it would just be 1100 IOPS: 44 disk times 100 IOPS divide by 2
> for RAID and divide by 2 for replication factor.
> There might be a bit of caching on the RAID controller and SSD journal but
> worst case you will get just 1100 IOPS.
>
> > I need around 20TB storage, SuperMicro SC846TQ can get 24 hardisk.
> > I may attach 24x 960G SSD - NO Raid - with 3x SuperMicro servers -
> replication factor 3.
> >
> >Or it's better to scale-out and put smaller disks on many servers such (
> HP DL380pG8/2x Intel Xeon E5-2650 ) which can hold 12 hardisk
> > And Attach 12x 960G SSD - NO Raid - 6x OSD nodes - replication factor 3.
>
> An OSD for a SSD can easily eat a whole CPU core so 24 SSDs would be to
> much.
> More smaller nodes also have the upside off smaller impact when a node
> breaks.
> You could also look at the Supermicro  2u twin chassis with 2 servers with
> 12 disks in 2u.
> Note that you will not get near to theoretical native performance of those
> combined SSDs (100000+ IOPS) but performance will be good none the less.
> There have been a few threads about that here before so look back in the
> mail threads to find out more.
>
> > 2. I'm using Mirantis/Fuel 5 for provisioning and deployment of nodes
> > When i attach the new ceph osd nodes to the environment, Will the data
> be replicated automatically
> > from my current old SuperMicro OSD nodes to the new servers after the
> deployment complete ?
> Don't know the specifics of Fuel and how it manages the crush map.
> Some of the data will end up there but not a copy of all data unless you
> specify the new servers as a new failure domain in the crush map.
>
> > 3. I will use 2x 960G SSD RAID 1 for OS
> > Is it recommended put the SSD journal disk as a separate partition on
> the same disk of OS ?
> If you run with SSDs only I would put the journals together with the data
> SSDs.
> It makes a lot of sense to have them on seperate SSDs when your data disks
> are spinners.
> (because of the speed difference and bad random IOPS performance of
> spinners.)
>
> > 4. Is it safe to remove the OLD ceph nodes while i'm currently using 2
> replication factors after adding the new hardware nodes ?
> It is probably not safe to just turn them off (as mentioned above it
> depend on the crush map failure domain layout)
> The safe way would be to follow the documentation on how to remove an OSD:
> http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
> This will make sure the data is re-located before the OSD is removed.
>
> > 5. Do i need RAID 1 for the journal hardisk ? and if not, What will
> happen if one of the journal HDD's failed ?
> No, it is not required. Both have trade-offs.
> Disks that are "behind the journal" will become unavailable when it
> happens.
> RAID1 will be a bit easier to replace in case of a single SSD failure but
> is useless if the 2 SSDs fail at the same time (e.g. due to wear).
> JBOD will reduce the write load and wear plus it has less impact when it
> does fail.
>
> > 6. Should i use RAID Level for the drivers on OSD nodes ? or it's better
> to go without RAID ?
> Without RAID usually makes for better performance. Benchmark your specific
> workload to be sure.
> In general I would go for 3 replica's and no RAID.
>
> Cheers,
> Robert van Leeuwen
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to