On 07/25/2014 12:04 PM, Christian Balzer wrote:
On Fri, 25 Jul 2014 07:24:26 -0500 Mark Nelson wrote:

On 07/25/2014 02:54 AM, Christian Balzer wrote:
On Fri, 25 Jul 2014 13:31:34 +1000 Matt Harlum wrote:

Hi,

I’ve purchased a couple of 45Drives enclosures and would like to
figure out the best way to configure these for ceph?

That's the second time within a month somebody mentions these 45 drive
chassis.
Would you mind elaborating which enclosures these are precisely?

I'm guessing the supermicro SC847E26:

http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm

Le Ouch!

They really must be getting  desperate for high density chassis that are
not top loading at Supermicro.

Well, if I read that link and the actual manual correctly, the most one
can hope to get from this is 48Gb/s (2 mini-SAS with 4 lanes each) which is
short of what 45 regular HDDs can dish out (or take in).
And that's ignoring the the inherent deficiencies when dealing with port
expanders.

Either way, a head for this kind of enclosure would need pretty much all
the things mentioned before, a low density (8 lanes), but high performance
and large cache controller and definitely SSDs for journals.

There must be some actual threshold, but my gut feeling tells me that
something slightly less dense where you don't have to get another case for
the head might turn out cheaper.
Especially if a 1U head (RAID/HBA and network cards) and space for
journal SSDs doesn't cut it.

Personally I'm a much bigger fan of the SC847A. No expanders in the backplane, 36 3.5" bays with the MB integrated. It's a bit old at this point and the fattwin nodes can go denser (both in terms of nodes and drives), but I've been pretty happy with it as a performance test platform. It's really nice having the drives directly connected to the controllers. having 4-5 controllers in 1 box is a bit tricky though. The fattwin hadoop nodes are a bit nicer in that regard.

Mark


Christian


I'm wondering especially about the backplane, as 45 is such an odd
number.

Also if you don't mind, specify "a couple" and what your net storage
requirements are.

In fact, read this before continuing:
---
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11011.html
---

Mainly I was wondering if it was better to set up multiple raid groups
and then put an OSD on each rather than an OSD for each of the 45
drives in the chassis?

Steve already towed the conservative Ceph party line here, let me give
you some alternative views and options on top of that and to recap
what I wrote in the thread above.

In addition to his links, read this:
---
https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf
---

Lets go from cheap and cheerful to "comes with racing stripes".

1) All spinning rust, all the time. Plunk in 45 drives, as JBOD behind
the cheapest (and densest) controllers you can get. Having the journal
on the disks will halve their performance, but you just wanted the
space and are not that pressed for IOPS.
The best you can expect per node with this setup is something around
2300 IOPS with normal (7200RPM) disks.

2) Same as 1), but use controllers with a large HW cache (4GB Areca
comes to mind) in JBOD (or 45 times RAID0) mode.
This will alleviate some of the thrashing problems, particular if
you're expecting high IOPS to be in short bursts.

3) Ceph Classic, basically what Steve wrote.
32HDDs, 8SSDs for journals (you do NOT want an uneven spread of
journals). This will give you sustainable 3200 IOPS, but of course the
journals on SSDs not only avoid all that trashing about on the disk
but also allow for coalescing of writes, so this is going to be
fastest solution so far. Of course you will need 3 of these at minimum
for acceptable redundancy, unlike 4) which just needs a replication
level of 2.

4) The anti-cephalopod. See my reply from a month ago in the link
above. All the arguments apply, it very much depends upon your use
case and budget. In my case the higher density, lower cost and ease of
maintaining the cluster where well worth the lower IOPS.

5) We can improve upon 3) by using HW cached controllers of course. And
hey, you did need to connect those drive bays somehow anyway. ^o^
Maybe even squeeze some more out of it by having the SSD controller
separate from the HDD one(s).
This is as fast (IOPS) as it comes w/o going to full SSD.


Networking:
Either of the setups above will saturate a single 10Gb/s aka 1GB/s as
Steve noted.
In fact 3) to 5) will be able to write up to 4GB/s in theory based on
the HDDs sequential performance, but that is unlikely to be seen in
real live. And of course your maximum write speed is  based on the
speed of the SSDs. So for example with 3) you would want those 8 SSDs
to have write speeds of about 250MB/s, giving you 2GB/s max write.
Which in turn means 2 10GB/s links at least, up to 4 if you want
redundancy and/or a separation of public and cluster network.

RAM:
The more, the merrier.
It's relatively cheap and avoiding have to actually read from the disks
will make your write IOPS so much happier.

CPU:
You'll want something like Steve recommended for 3), I'd go with 2
8core CPUs actually, so you have some Oomps to spare for the OS, IRQ
handling, etc. With 4) and actual 4 OSDs, about half of that will be
fine, with the expectation of Ceph code improvements.

Mobo:
You're fine for overall PCIe bandwidth, even w/o going to PCIe v3.
But you might have up to 3 HBAs/RAID cards and 2 network cards, so make
sure you and get this all into appropriate slots.

Regards,

Christian


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to