Re: Premature "No Space left on device" on XFS

Bron Gondwana Sun, 09 Oct 2011 15:50:58 -0700

On Mon, Oct 10, 2011 at 01:33:31AM +0300, karave...@mail.bg wrote:
> Nice setup. And thanks for your work on Cyrus. We are 
> looking also to move the metadata on SSDs but we have not
> found yet cost effective devices - we need at least a pair of 
> 250G disk for 20-30T spool on a server.


You can move cyrus.cache to data now, that's the whole
point, because it doesn't need to be mmaped in so much.

> Setting a higher number  of allocation groups per XFS 
> filesystem helps a lot for the concurrency. My rule of 
> thumb (learnt from databases) is: 
> number of spindles + 2 * number of CPUs.
> You have done the same with multiple filesystems.
>
> About the fsck times. We experienced a couple of power
> failures and XFS comes up in 30-45 minutes  (30T in
> RAID5 of 12 SATA disks).  If the server is shut down 
> correctly in comes up in a second.

Interesting - is that 30-45 minutes actually a proper
fsck, or just a log replay?

More interestingly, what's your disaster recovery plan
for when you lose multiple disks?  Our design is
heavily influenced by having lost 3 disks in a RAID6
within 12 hours.  It took a week to get everyone back
from backups, just because of the IO rate limits of
the backup server.

> We know that RAID5 is not the best option for write 
> scalability, but the controller write cache helps a lot.

Yeah, we did RAID5 for a while - but it turned out we
were still being write limited more than disk space
limited, so the last RAID5s are being phased out for
more RAID1.

Bron.

Re: Premature "No Space left on device" on XFS

Reply via email to