On 04/16/2010 01:38 AM, dir dir wrote:
I hear Facebook.com and tweeter.com <http://tweeter.com> using
cassandra database. In my opinion Facebook and
tweeter have hundreds TB data. because their user reach hundreds
million people.
I think you might be forgetting just how tiny tweets are. The last
numbers I heard tweeter gets 55,000,000 messages a day. They've been
around for roughly 4 years. Even assuming they always had that number of
messages (which isn't the case) that still would only be roughly 11TB of
data if everyone sent the maximum tweet length. Sure add a bit to each
message for a time stamp and the user that posted it but still I'd be
surprised if every tweet including meta data was much more than 20TB.
Similarly with Facebook. I think it is the friend list search that they
really did it with. Regardless how much text is on your Facebook page?
Maybe 1MB if you are a very very active user. The images I wouldn't
think they would load directly into Cassandra but I could be wrong, I
would suspect that they would pull an old database trick and have
filesystem store the images and the "database" just stores the path to it.
There could be a lot of other data floating around some of which might
be in Cassandra but I don't know. Just the core data that the sites have
mentioned that they use Cassandra for I think is probably in the very
low 10's of TB.
Lastly sites like Facebook and Tweeter count hundreds of millions of
users but a lot of them are people that sign in, send a few tweets or
connect to a few friends and then don't use the site again. When the
company needs to make themselves look valuable they count every single
person that ever logged in, even if they only did it once or haven't
used the site for years. They want to sell large numbers because that is
what advertisers/potential acquirers to base the price on those large
numbers.
Dir.
On Fri, Apr 16, 2010 at 1:28 PM, Linton N
<gabrialmarialin...@gmail.com <mailto:gabrialmarialin...@gmail.com>>
wrote:
hi ,
I am working for the past 1 year with hadoop, but quite
new to cassandra, I would like to get clarified few things
regarding the scalability of Cassandra. Can it scall up to TB of
data ?
Please provide me some links regarding this..
--
--
With Love
Lin N