On Fri, Apr 16, 2010 at 9:17 AM, Mike Gallamore <mike.e.gallam...@googlemail.com> wrote: > On 04/16/2010 01:38 AM, dir dir wrote: > > I hear Facebook.com and tweeter.com using cassandra database. In my opinion > Facebook and > tweeter have hundreds TB data. because their user reach hundreds million > people. > > I think you might be forgetting just how tiny tweets are. The last numbers I > heard tweeter gets 55,000,000 messages a day. They've been around for > roughly 4 years. Even assuming they always had that number of messages > (which isn't the case) that still would only be roughly 11TB of data if > everyone sent the maximum tweet length. Sure add a bit to each message for a > time stamp and the user that posted it but still I'd be surprised if every > tweet including meta data was much more than 20TB.
While valid points, I think there are separate issues wrt searching and indexing, where number of entries is more relevant. I mean, storing and accessing big BLOBs is not trivial, but in many ways it is less problematic than craploads of smaller entries. This is probably more so for FB and LinkedIn, with more graph-oriented challenges. But it's not just (or even mainly) about raw storage but all the slicing and dicing that makes things challenging. -+ Tatu +-