Does that include HD copies of CNN et al reading tweets to people on
T.V.? You know your medium is doomed when you're reduced to reading
comments from random_dude64 and omg69 because they get the news out
faster than you can.
They must be tracking a lot more than just the tweets themselves (which
is expected if they want to monetize the service) as even they say 50M a
day: http://blog.twitter.com/2010/02/measuring-tweets.html.
Not saying your wrong, just they must be doing a lot else with it.
Perhaps their logs of delivery of the tweets are just the tweets
themselves: they are small enough perhaps generating a hash of the
message and saving the hash into a log makes less sense than just going
ahead and saving a copy for each person that is subscribed.Either way
crazy data. Lots of people with more data, but I doubt as much that was
typed by hand :-)
On 04/16/2010 02:22 PM, Stu Hood wrote:
http://twitter.com/jromeh/status/12295736793
-----Original Message-----
From: "Mike Gallamore"<mike.e.gallam...@googlemail.com>
Sent: Friday, April 16, 2010 3:46pm
To: user@cassandra.apache.org
Subject: Re: Regarding Cassandra Scalability
Also people with 1M followers tend to have "public" tweets, which means
really I think it would be the same as subscribing to a RSS feed or
whatever. You aren't getting a local copy because you will "always" have
access to the tweet as will everyone else. Also tweets don't change
AFAIK so no point in having redundant copies.
On 04/16/2010 01:42 PM, Peter Chang wrote:
Yeah. I wasn't sure if Cassandra was optimized for binary data
especially since any site of that size will use a CDN. Interesting
read though.
I think 1K per tweet is off by an order of magnitude considering they
only allow 140 characters. Regardless the number of users with> 1MM
is probably a handful. Also im guessing they purge data after a
certain window (like 30 days for example).
Sent from my iPhone
On Apr 16, 2010, at 12:02 PM, gabriele renzi<rff....@gmail.com> wrote:
On Fri, Apr 16, 2010 at 6:41 PM, Peter Chang<pete...@gmail.com>
wrote:
FB also does pics and movies so 1MB is way off depending on where
they
manage such binary data.
apparently not in cassandra http://www.facebook.com/note.php?note_id=76191543919
I do agree that 1MB of text alone is a lot of text
which is more relevant in the case of Twitter. The only large thing
you
leave out is denormalization. Every tweet you write is likely
denormalized
across your followers to allow for quick read access.
.. but considering many users have _millions_ of followers, this may
be quite a bit more data. Assuming 1k per tweet, this would mean one
from @aplusk (4.7M followers) would take more than 4 gigabytes of
data. Assuming ten tweets a day, in one month he'd produce one TB.
I'd say they only store references (increasing number lists can also
be encoded very cleverly), or in some other way I'm not smart enough
to think of.