----- Цитат от Bron Gondwana (br...@fastmail.fm), на 10.10.2011 в 01:12 -----
> 
> Here's what our current IMAP servers look like:
> 
> 2 x 92GB SSD
> 12 x 2TB SATA
> 
> two of the SATA drives are hotspares - though I'm
> wondering if that's actually necessary now, we
> haven't lost any yet, and we have 24 hr support in
> our datacentres.  Hot swap is probably fine.
> 
> so - 5 x RAID1 for a total of 10TB storage.
> 
> Each 2TB volume is then further split into 4 x 500Gb
> partitions.  The SSD is just a single partition with
> all the metadata, which is a change from our previous
> pattern of separate metadata partitions as well, but
> has been performing OK thanks to the performance of
> SSD.
> 
> The SSDs are in RAID1 as well.
> 
> This gives us 20 separate mailbox databases, which
> not only keeps the size down, but gives us concurrency
> for free - so there's no single points of contention
> for the entire machine.  It gives us small enough
> filesystems that you can actually fsck them in a day,
> and fill up a new replica in a day as well.
> 
> And it means when we need to shut down a single machine,
> the masters transfer to quite a few other machines
> rather than one replica host taking all the load, so
> it spreads things around nicely.
> 
> This is letting us throw a couple of hundred thousand
> users on a single one of these machines and barely
> break a sweat.  It took a year or so of work to rewrite
> the internals of Cyrus IMAP to cut down the IO hits on
> the SATA drives, but it was worth it.
> 
> Total cost for one of these boxes, with 48GB RAM and a
> pair of CPUs is under US $13k - and they scale very
> linearly - throw a handful of them into the datacentre
> and toss some replicas on there.  Easy.
> 
> And there's no single point of failure - each machine
> is totally standalone - with its own CPU, its own
> storage, its own metadata.  Nice.
> 
> So yeah, I'm quite happy with the sweet spot that I've
> found at the moment - and it means that a single machine
> has 21 separate filesystems on it.  So long as there's
> no massive lock that all the filesystems have to go
> through, we get the scalability horizontally rather
> than vertically.
> 
> Bron.
> 

Nice setup. And thanks for your work on Cyrus. We are 
looking also to move the metadata on SSDs but we have not
found yet cost effective devices - we need at least a pair of 
250G disk for 20-30T spool on a server. 

Setting a higher number  of allocation groups per XFS 
filesystem helps a lot for the concurrency. My rule of 
thumb (learnt from databases) is: 
number of spindles + 2 * number of CPUs.
You have done the same with multiple filesystems.

About the fsck times. We experienced a couple of power
failures and XFS comes up in 30-45 minutes  (30T in
RAID5 of 12 SATA disks).  If the server is shut down 
correctly in comes up in a second.

We know that RAID5 is not the best option for write 
scalability, but the controller write cache helps a lot.

Best regards
--
Luben Karavelov

Reply via email to