----- Цитат от Bron Gondwana (br...@fastmail.fm), на 10.10.2011 в 01:12 ----- > > Here's what our current IMAP servers look like: > > 2 x 92GB SSD > 12 x 2TB SATA > > two of the SATA drives are hotspares - though I'm > wondering if that's actually necessary now, we > haven't lost any yet, and we have 24 hr support in > our datacentres. Hot swap is probably fine. > > so - 5 x RAID1 for a total of 10TB storage. > > Each 2TB volume is then further split into 4 x 500Gb > partitions. The SSD is just a single partition with > all the metadata, which is a change from our previous > pattern of separate metadata partitions as well, but > has been performing OK thanks to the performance of > SSD. > > The SSDs are in RAID1 as well. > > This gives us 20 separate mailbox databases, which > not only keeps the size down, but gives us concurrency > for free - so there's no single points of contention > for the entire machine. It gives us small enough > filesystems that you can actually fsck them in a day, > and fill up a new replica in a day as well. > > And it means when we need to shut down a single machine, > the masters transfer to quite a few other machines > rather than one replica host taking all the load, so > it spreads things around nicely. > > This is letting us throw a couple of hundred thousand > users on a single one of these machines and barely > break a sweat. It took a year or so of work to rewrite > the internals of Cyrus IMAP to cut down the IO hits on > the SATA drives, but it was worth it. > > Total cost for one of these boxes, with 48GB RAM and a > pair of CPUs is under US $13k - and they scale very > linearly - throw a handful of them into the datacentre > and toss some replicas on there. Easy. > > And there's no single point of failure - each machine > is totally standalone - with its own CPU, its own > storage, its own metadata. Nice. > > So yeah, I'm quite happy with the sweet spot that I've > found at the moment - and it means that a single machine > has 21 separate filesystems on it. So long as there's > no massive lock that all the filesystems have to go > through, we get the scalability horizontally rather > than vertically. > > Bron. >
Nice setup. And thanks for your work on Cyrus. We are looking also to move the metadata on SSDs but we have not found yet cost effective devices - we need at least a pair of 250G disk for 20-30T spool on a server. Setting a higher number of allocation groups per XFS filesystem helps a lot for the concurrency. My rule of thumb (learnt from databases) is: number of spindles + 2 * number of CPUs. You have done the same with multiple filesystems. About the fsck times. We experienced a couple of power failures and XFS comes up in 30-45 minutes (30T in RAID5 of 12 SATA disks). If the server is shut down correctly in comes up in a second. We know that RAID5 is not the best option for write scalability, but the controller write cache helps a lot. Best regards -- Luben Karavelov