On Friday 20 April 2007 08:32, Tony Abernethy wrote:
> Jason Beaudoin wrote:
> > <snip>
> >
> > > Use all the tricks you can for YOUR solution, including:
> > >   * lots of "small" partitions
> >
> > What are the reasonings behind this?
> >
> > Thanks for the awesome post!
>
> I think it runs something like this
> If there is a problem somewhere on the disk,
> if it's all one big partition, you must fix the big partition
> if it's lots of small partitions, you fix the one with the problem.
>
> Even worse, in some situations,
> the difference is between being dead and being somewhat crippled.
>
> Methinks there's lots of hard-won experience behind Nick's answers ;)

You last assumption is the most correct, and Nick has put some of that 
experience into FAQ-14 for our reading pleasure.

In general, you always want to assume a failure *WILL* occur, rather 
than think in terms of "if" something will fail. Having lots of small 
partitions, and using Read Only partitions wherever possible (also 
mentioned by Nick) gives you a number of important advantages. 

Assume that someone, possibly you, has managed to trip over the power 
cord, how long will it take you to get the server back up?

If your partitions are Read/Write, then you will be doing a fsck on each 
of them. That means time.

If your partitions are huge, then you will need a lot of RAM and time to 
preform the fsck. If you have a massive partition and insufficient RAM, 
then your fsck will fail (see FAQ-14.7 "fsck(8) time and memory 
requirements") and you'll be stuck like a turtle on it's back at a soup 
competition.

The above is just your start up time after a crash or power loss.

Assume that someone, possibly you, has written some bad code that will 
scribble all over the data in one of your partitions. How long will it 
take you to recover?

If the partition was marked RO, then you don't have a problem. If it was 
a small RW partition, you can repair it reasonably quickly from backup. 
If your backup media fails, your losses are minimal. By comparison, if 
it's a huge RW partition, then you're stuffed.

The list of reasons goes on and on but when you really think about it, 
you'll understand that you're just doing proper "risk management" by 
trying to mitigate as many of the bad effects of failures as possible.

Never drink the marketing kool-aid that will try to sell you on the idea 
that failures are somehow avoidable. Sure, it might sound like a nice 
idea but the idea always falls short of reality. Being prepared for the 
reality of failures is a much better approach than sticking your head 
in the sand.

/jcr

Reply via email to