Also people with 1M followers tend to have "public" tweets, which means
really I think it would be the same as subscribing to a RSS feed or
whatever. You aren't getting a local copy because you will "always" have
access to the tweet as will everyone else. Also tweets don't change
AFAIK so no point in having redundant copies.
On 04/16/2010 01:42 PM, Peter Chang wrote:
Yeah. I wasn't sure if Cassandra was optimized for binary data
especially since any site of that size will use a CDN. Interesting
read though.
I think 1K per tweet is off by an order of magnitude considering they
only allow 140 characters. Regardless the number of users with> 1MM
is probably a handful. Also im guessing they purge data after a
certain window (like 30 days for example).
Sent from my iPhone
On Apr 16, 2010, at 12:02 PM, gabriele renzi<rff....@gmail.com> wrote:
On Fri, Apr 16, 2010 at 6:41 PM, Peter Chang<pete...@gmail.com>
wrote:
FB also does pics and movies so 1MB is way off depending on where
they
manage such binary data.
apparently not in cassandra http://www.facebook.com/note.php?note_id=76191543919
I do agree that 1MB of text alone is a lot of text
which is more relevant in the case of Twitter. The only large thing
you
leave out is denormalization. Every tweet you write is likely
denormalized
across your followers to allow for quick read access.
.. but considering many users have _millions_ of followers, this may
be quite a bit more data. Assuming 1k per tweet, this would mean one
from @aplusk (4.7M followers) would take more than 4 gigabytes of
data. Assuming ten tweets a day, in one month he'd produce one TB.
I'd say they only store references (increasing number lists can also
be encoded very cleverly), or in some other way I'm not smart enough
to think of.