I'm wondering especially about the backplane, as 45 is such an odd
number.
Also if you don't mind, specify "a couple" and what your net storage
requirements are.
In fact, read this before continuing:
---
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11011.html
---
Mainly I was wondering if it was better to set up multiple raid groups
and then put an OSD on each rather than an OSD for each of the 45
drives in the chassis?
Steve already towed the conservative Ceph party line here, let me give
you some alternative views and options on top of that and to recap
what I wrote in the thread above.
In addition to his links, read this:
---
https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf
---
Lets go from cheap and cheerful to "comes with racing stripes".
1) All spinning rust, all the time. Plunk in 45 drives, as JBOD behind
the cheapest (and densest) controllers you can get. Having the journal
on the disks will halve their performance, but you just wanted the
space and are not that pressed for IOPS.
The best you can expect per node with this setup is something around
2300 IOPS with normal (7200RPM) disks.
2) Same as 1), but use controllers with a large HW cache (4GB Areca
comes to mind) in JBOD (or 45 times RAID0) mode.
This will alleviate some of the thrashing problems, particular if
you're expecting high IOPS to be in short bursts.
3) Ceph Classic, basically what Steve wrote.
32HDDs, 8SSDs for journals (you do NOT want an uneven spread of
journals). This will give you sustainable 3200 IOPS, but of course the
journals on SSDs not only avoid all that trashing about on the disk
but also allow for coalescing of writes, so this is going to be
fastest solution so far. Of course you will need 3 of these at minimum
for acceptable redundancy, unlike 4) which just needs a replication
level of 2.
4) The anti-cephalopod. See my reply from a month ago in the link
above. All the arguments apply, it very much depends upon your use
case and budget. In my case the higher density, lower cost and ease of
maintaining the cluster where well worth the lower IOPS.
5) We can improve upon 3) by using HW cached controllers of course. And
hey, you did need to connect those drive bays somehow anyway. ^o^
Maybe even squeeze some more out of it by having the SSD controller
separate from the HDD one(s).
This is as fast (IOPS) as it comes w/o going to full SSD.
Networking:
Either of the setups above will saturate a single 10Gb/s aka 1GB/s as
Steve noted.
In fact 3) to 5) will be able to write up to 4GB/s in theory based on
the HDDs sequential performance, but that is unlikely to be seen in
real live. And of course your maximum write speed is based on the
speed of the SSDs. So for example with 3) you would want those 8 SSDs
to have write speeds of about 250MB/s, giving you 2GB/s max write.
Which in turn means 2 10GB/s links at least, up to 4 if you want
redundancy and/or a separation of public and cluster network.
RAM:
The more, the merrier.
It's relatively cheap and avoiding have to actually read from the disks
will make your write IOPS so much happier.
CPU:
You'll want something like Steve recommended for 3), I'd go with 2
8core CPUs actually, so you have some Oomps to spare for the OS, IRQ
handling, etc. With 4) and actual 4 OSDs, about half of that will be
fine, with the expectation of Ceph code improvements.
Mobo:
You're fine for overall PCIe bandwidth, even w/o going to PCIe v3.
But you might have up to 3 HBAs/RAID cards and 2 network cards, so make
sure you and get this all into appropriate slots.
Regards,
Christian