On Wed, Apr 18, 2007 at 03:22:07PM +0530, Siju George wrote: > Hi, > > How Do you handle when you have to Serve terrabytes of Data through > http/https/ftp etc? > Put it on Differrent machines and use some knid of > loadbalancer/intelligent program that directs to the right mahine? > > use some kind of clustering Software? > > Waht hardware do you use to make your System Scalable from a few > terrabytes of Data to a few hundred of them? > > Does OpenBSD have any clustering Software available? > > Is anyone running such setups? > Please let me know :-)
I don't really know, but how about some http proxy (hoststated comes to mind, pound or squid also works) and a lot of hosts each serving a subset of the total behind that? Yes, that's exactly what you said. I don't think NFS/AFS is that good an idea; you'll need very beefy fileservers and a fast network. Maybe rsync'ing from a central fileserver would work? However, there are a lot of specialized solutions available (various SANs come to mind; Google has published several papers on filesystems and algorithms like MapReduce, although the latter isn't going to help you for serving HTTP). All in all, though, I think the most important part are rate of change and reliability conditions. A big web host might hit an impressive amount of data, but it doesn't change all that often and a site occasionally going offline is usually tolerated (just restore a recent backup). In such cases, something like the above seems to work. Joachim -- TFMotD: moduli (5) - system moduli file