That's just the data our analytics team produces (logs, etc). Production/online data is separate.
-ryan On Fri, Apr 16, 2010 at 2:22 PM, Stu Hood <stu.h...@rackspace.com> wrote: > http://twitter.com/jromeh/status/12295736793 > > -----Original Message----- > From: "Mike Gallamore" <mike.e.gallam...@googlemail.com> > Sent: Friday, April 16, 2010 3:46pm > To: user@cassandra.apache.org > Subject: Re: Regarding Cassandra Scalability > > Also people with 1M followers tend to have "public" tweets, which means > really I think it would be the same as subscribing to a RSS feed or > whatever. You aren't getting a local copy because you will "always" have > access to the tweet as will everyone else. Also tweets don't change > AFAIK so no point in having redundant copies. > On 04/16/2010 01:42 PM, Peter Chang wrote: >> Yeah. I wasn't sure if Cassandra was optimized for binary data >> especially since any site of that size will use a CDN. Interesting >> read though. >> >> I think 1K per tweet is off by an order of magnitude considering they >> only allow 140 characters. Regardless the number of users with> 1MM >> is probably a handful. Also im guessing they purge data after a >> certain window (like 30 days for example). >> >> Sent from my iPhone >> >> >> On Apr 16, 2010, at 12:02 PM, gabriele renzi<rff....@gmail.com> wrote: >> >> >>> On Fri, Apr 16, 2010 at 6:41 PM, Peter Chang<pete...@gmail.com> >>> wrote: >>> >>>> FB also does pics and movies so 1MB is way off depending on where >>>> they >>>> manage such binary data. >>>> >>> apparently not in cassandra >>> http://www.facebook.com/note.php?note_id=76191543919 >>> >>> >>>> I do agree that 1MB of text alone is a lot of text >>>> which is more relevant in the case of Twitter. The only large thing >>>> you >>>> leave out is denormalization. Every tweet you write is likely >>>> denormalized >>>> across your followers to allow for quick read access. >>>> >>> .. but considering many users have _millions_ of followers, this may >>> be quite a bit more data. Assuming 1k per tweet, this would mean one >>> from @aplusk (4.7M followers) would take more than 4 gigabytes of >>> data. Assuming ten tweets a day, in one month he'd produce one TB. >>> >>> I'd say they only store references (increasing number lists can also >>> be encoded very cleverly), or in some other way I'm not smart enough >>> to think of. >>> > > > >