Re: Webservers with Terrabytes of Data in - recomended setups

Nick Holland Thu, 19 Apr 2007 18:01:01 -0700

Siju George wrote:
> Hi,
> 
> How Do you handle when you have to Serve terrabytes of Data through
> http/https/ftp etc?
> Put it on Differrent machines and use some knid of
> loadbalancer/intelligent program that directs to the right mahine?
> 
> use some kind of clustering Software?
> 
> Waht hardware do you use to make your System Scalable from a few
> terrabytes of Data to a few hundred of them?
> 
> Does OpenBSD have any clustering Software available?
> 
> Is anyone running such setups?
> Please let me know :-)
> 
> Thankyou so much
> 
> Kind Regards
> 
> Siju


Too open-ended a question...
Are you talking about many TB on one site?  Lots of sites?
Is there some reason it has to be on one server or one site?
Is this "huge storage, huge demand"?  Huge storage, low demand?
Is this storage all needed on day 1, or will it grow with time?
  (hint: if it grows with time, build for NOW, with ability to
add later, don't buy storage in advance!)
etc.

Let the answers to those questions guide your engineering work,
don't rely on knee-jerk reactions.  And don't be afraid to
change the question to meet available answers. :)  Common
error is to take the given proposed solution (posed as a problem,
but often someone has digested the REAL problem into what they
think is the only possible model, and sent you down a bad alley)
as gospel, and never question the basic assumptions.

I've got a web server with over 3.5TB of storage on it that cost
about $6000US a year or so ago.  It's a huge-storage, low-demand
app, probably gets on average a query a day, if that.  If the
box breaks, time can be spent repairing it, but we don't want to
lose the data (it's carefully backed up, but the backup media
is so compressed, it takes longer to uncompress the files than
it does to scp them back into the box!).  So, the thing has
redundancy where it counts (disk) and simplicity where it
doesn't matter, and it can be upgraded, enhanced and changed
as needed.  And, we have a small enough amount invested in the
thing that we can completely change our mind about the approach
to the problem any time in the future and throw it all away with
a very clear conscience.  (My current boss-of-the-week thinks he
wants to replace this with an unknown proprietary app feeding a
$30,000 per-processor database server attached to a $60,000 disk
array, so you can see how insignificant the price tag on this
system is.  You can also see something about my boss.  And why
I'm looking for a better job).

Let's say you have one website that you are trying to serve
massive amounts of static files from.  I presume you aren't just
dropping people at the root of a massive directory tree and
letting them dig for their desired file...you probably have
some kind of app directing them to the file they need.  Well,
you should have no problem also directing them to the SERVER
they need, as well...do a little magic on the front-end machine,
you could also implement massive amounts of very cheap
redundancy for very low cost.  For example, if you have two
machines, A and B, skip RAID, just put both data sets on both
machines.  If you lose A, serve A's files from B, it's a little
slower, but still working.  Repair A, resync (if needed) and
you are back up and running at 100%.  Now you can use the
absolutely cheapest and least redundant machines around to
accomplish your task.  (in this case, your front-end machines
would have to be a little more sophisticated...but still
should have multiple-machine redundancy).


SANs are the cool way to do this, of course.  Also a very
expensive way...and something I'd try to avoid unless it was
really needed.  Design it simple, design it to be fixable
WHEN it breaks, and you will save your hair...

Use all the tricks you can for YOUR solution, including:
  * lots of "small" partitions
  * RO any partitions you can (no need to fsck after an oops)
  * Assume you will need more storage later, and figure out how
to add it without removing data from your existing storage
  * Assume your existing 500G disk is going to look pathetic in
a few years when 10TB microdrives are in your palmtop computer,
and make sure you have a plan to migrate the data off those first
disks you installed.
  * Guess how much processor you need, and figure out how to
deal with it when you are wrong.
  * Keep in mind if you don't expect lots of demand this year,
next year's systems will be a lot faster, bigger and cheaper.
  * Last year's computers loaded with modern disks are still
pretty darned fast for many applications.


Nick.

Re: Webservers with Terrabytes of Data in - recomended setups

Reply via email to