Re: Regarding Cassandra Scalability

Peter Chang Fri, 16 Apr 2010 14:56:53 -0700

The redundancy/denormalization takes advantage of cheap writes to make
reads really quick. Imagine a query that returns one row with your
whole tweet stream vs having to do 50 separate lookups per tweet.
Space is cheap and the upside is performance.... Especially if you're
getting a lot of fail whales. :)


Sent from my iPhone


On Apr 16, 2010, at 2:46 PM, Mike Gallamore <mike.e.gallam...@googlemail.com
 > wrote:

> Also people with 1M followers tend to have "public" tweets, which
> means really I think it would be the same as subscribing to a RSS
> feed or whatever. You aren't getting a local copy because you will
> "always" have access to the tweet as will everyone else. Also tweets
> don't change AFAIK so no point in having redundant copies.
> On 04/16/2010 01:42 PM, Peter Chang wrote:
>> Yeah. I wasn't sure if Cassandra was optimized for binary data
>> especially since any site of that size will use a CDN. Interesting
>> read though.
>>
>> I think 1K per tweet is off by an order of magnitude considering they
>> only allow 140 characters. Regardless the number of users with>  1MM
>> is probably a handful. Also im guessing they purge data after a
>> certain window (like 30 days for example).
>>
>> Sent from my iPhone
>>
>>
>> On Apr 16, 2010, at 12:02 PM, gabriele renzi<rff....@gmail.com>
>> wrote:
>>
>>
>>> On Fri, Apr 16, 2010 at 6:41 PM, Peter Chang<pete...@gmail.com>
>>> wrote:
>>>
>>>> FB also does pics and movies so 1MB is way off depending on where
>>>> they
>>>> manage such binary data.
>>>>
>>> apparently not in cassandra 
>>> http://www.facebook.com/note.php?note_id=76191543919
>>>
>>>
>>>> I do agree that 1MB of text alone is a lot of text
>>>> which is more relevant in the case of Twitter. The only large thing
>>>> you
>>>> leave out is denormalization. Every tweet you write is likely
>>>> denormalized
>>>> across your followers to allow for quick read access.
>>>>
>>> .. but considering many users have _millions_ of followers, this may
>>> be quite a bit more data. Assuming 1k per tweet, this would mean one
>>> from @aplusk (4.7M followers) would take more than 4 gigabytes of
>>> data. Assuming ten tweets a day, in one month he'd produce one TB.
>>>
>>> I'd say they only store references (increasing number lists can also
>>> be encoded very cleverly), or in some other way I'm not smart enough
>>> to think of.
>>>
>

Re: Regarding Cassandra Scalability

Reply via email to