Yeah. I wasn't sure if Cassandra was optimized for binary data especially since any site of that size will use a CDN. Interesting read though.
I think 1K per tweet is off by an order of magnitude considering they only allow 140 characters. Regardless the number of users with > 1MM is probably a handful. Also im guessing they purge data after a certain window (like 30 days for example). Sent from my iPhone On Apr 16, 2010, at 12:02 PM, gabriele renzi <rff....@gmail.com> wrote: > On Fri, Apr 16, 2010 at 6:41 PM, Peter Chang <pete...@gmail.com> > wrote: >> FB also does pics and movies so 1MB is way off depending on where >> they >> manage such binary data. > > apparently not in cassandra > http://www.facebook.com/note.php?note_id=76191543919 > >> I do agree that 1MB of text alone is a lot of text >> which is more relevant in the case of Twitter. The only large thing >> you >> leave out is denormalization. Every tweet you write is likely >> denormalized >> across your followers to allow for quick read access. > > .. but considering many users have _millions_ of followers, this may > be quite a bit more data. Assuming 1k per tweet, this would mean one > from @aplusk (4.7M followers) would take more than 4 gigabytes of > data. Assuming ten tweets a day, in one month he'd produce one TB. > > I'd say they only store references (increasing number lists can also > be encoded very cleverly), or in some other way I'm not smart enough > to think of.