Re: Regarding Cassandra Scalability

Ryan King Fri, 16 Apr 2010 15:07:25 -0700

That's just the data our analytics team produces (logs, etc).
Production/online data is separate.


-ryan

On Fri, Apr 16, 2010 at 2:22 PM, Stu Hood <stu.h...@rackspace.com> wrote:
> http://twitter.com/jromeh/status/12295736793
>
> -----Original Message-----
> From: "Mike Gallamore" <mike.e.gallam...@googlemail.com>
> Sent: Friday, April 16, 2010 3:46pm
> To: user@cassandra.apache.org
> Subject: Re: Regarding Cassandra Scalability
>
> Also people with 1M followers tend to have "public" tweets, which means
> really I think it would be the same as subscribing to a RSS feed or
> whatever. You aren't getting a local copy because you will "always" have
> access to the tweet as will everyone else. Also tweets don't change
> AFAIK so no point in having redundant copies.
> On 04/16/2010 01:42 PM, Peter Chang wrote:
>> Yeah. I wasn't sure if Cassandra was optimized for binary data
>> especially since any site of that size will use a CDN. Interesting
>> read though.
>>
>> I think 1K per tweet is off by an order of magnitude considering they
>> only allow 140 characters. Regardless the number of users with>  1MM
>> is probably a handful. Also im guessing they purge data after a
>> certain window (like 30 days for example).
>>
>> Sent from my iPhone
>>
>>
>> On Apr 16, 2010, at 12:02 PM, gabriele renzi<rff....@gmail.com>  wrote:
>>
>>
>>> On Fri, Apr 16, 2010 at 6:41 PM, Peter Chang<pete...@gmail.com>
>>> wrote:
>>>
>>>> FB also does pics and movies so 1MB is way off depending on where
>>>> they
>>>> manage such binary data.
>>>>
>>> apparently not in cassandra 
>>> http://www.facebook.com/note.php?note_id=76191543919
>>>
>>>
>>>> I do agree that 1MB of text alone is a lot of text
>>>> which is more relevant in the case of Twitter. The only large thing
>>>> you
>>>> leave out is denormalization. Every tweet you write is likely
>>>> denormalized
>>>> across your followers to allow for quick read access.
>>>>
>>> .. but considering many users have _millions_ of followers, this may
>>> be quite a bit more data. Assuming 1k per tweet, this would mean one
>>> from @aplusk (4.7M followers) would take more than 4 gigabytes of
>>> data. Assuming ten tweets a day, in one month he'd produce one TB.
>>>
>>> I'd say they only store references (increasing number lists can also
>>> be encoded very cleverly), or in some other way I'm not smart enough
>>> to think of.
>>>
>
>
>
>

Re: Regarding Cassandra Scalability

Reply via email to