Hey all,
I've been planning building myself a server cluster as a sort of hobby
project, and I've decided to use Ceph for its storage system. I have a
few questions, though.
My plan is to build 3 relatively dense servers (20 drive bays each) and
fill each one with relatively consumer equipment (AMD 8-core FX
processor, 24+ GB ECC RAM, and a decent SAS card that can provide a
channel to each drive). For drives, I was planning on using 3 TB or 4
TB WD Red drives (fairly cheap but should be reliable). I'm only
budgeting ~$7500 for it, so I'll only populate 5 drives per node from
the get-go, but I can just fill them up as my storage requirements grow.
There's a catch though: I also want to run some VMs on this cluster
(KVM/libvirt managed by Pacemaker, with RBD as block devices of
course). I don't plan on running anything particularly heavy (a voice
server here, a web server there, maybe a game server or two), and the
workload on the cluster won't be heavy (maybe 3-5 users max, likely idle
most of the time with bursts up to 1 Gbps reads if the cluster can
provide it).
I have 4 questions:
* The docs mention aiming for 1 GB RAM per 1 TB storage. However,
consumer equipment seems to max out around 32 GB - I couldn't find any
reputable consumer motherboards that supported more. If the nodes are
fairly populated at ~50 TB each, and VMs are using ~4 GB RAM on each
node, that leaves me with just over 500 MB RAM per 1 TB storage. For
smaller loads, will this suffice? Are the nodes going to be choked when
a disk fails and Ceph migrates data? Even if I migrate all the VMs to
separate nodes by the time I max out the Ceph nodes, that's still only
32 GB RAM for 60-80 TB storage.
* I'm planning on having either 3x or 5x 1 Gbps ethernet port on each
node, with a decent managed switch. I should be able to aggregate these
lines however I wish - say, either use just a single 5 Gbps connection
to the switch, or split it into a 2 Gbps front-end connection and 3 Gbps
back-end connection. I would value any input on which configuration
would likely be best. Both fiber and 10 Gbps copper are outside of my
price range.
* How stable is CephFS? When I started planning this (months ago),
CephFS sounded pretty unstable, but I still wanted to be able to provide
a filesystem to clients. I planned on doing this by allocating a very
large RBD image to a VM, having that VM format it as ext4 or xfs, and
then run Samba on the VM to "export" the filesystem. It seems like
CephFS has matured since then, though, to the point where running an MDS
on each node (with only a single primary/master MDS) *should* run
smoothly, and significantly faster than the "wrap ext4 and Samba around
RBD" solution. Again, this is a home cluster, so I won't lose my job if
the system dies - it's definitely not mission-critical, but I still
don't want to restore from backups every month. [As a small side note:
Can a single MDS daemon manage multiple, independent filesystems? I
couldn't find anything in the docs about it.]
* I'm planning on buying a single SSD for each node for the OS and
journals. As I populated the nodes, I was going to buy a second SSD,
and split each SSD into two partitions - so I can have a RAID 1
partition for the OS and a larger RAID 0 partition for the journals. Is
this unwise? Will two SSDs be able to provide enough throughput and
IOPS for 20 journals, or do I need to plan for more?
I'm also grateful for any other comments or suggestions you can offer.
I probably won't order the parts for another 1-2 weeks, so there's
plenty of time for me to switch things around a bit based on advice from
this ML.
Thanks for your time,
- Ethan
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com