This isn't quite true, I think. RandomPartitioner uses MD5. So if you had 10^16 
rows, you would have a 10^-6 chance of a collision, according to 
http://en.wikipedia.org/wiki/Birthday_attack ... and apparently MD5 isn't quite 
balanced, so your actual odds of a collision are worse (though I'm not familiar 
with the literature).

10^16 is very large... but conceivable, I guess.

-- Shaun


On Feb 16, 2011, at 4:05 AM, Sylvain Lebresne wrote:

> Sky is the limit.
> 
> Columns in a row are limited to 2 billion because the size of a row is 
> recorded in a java int. A row must also fit on one node, so this also limit 
> in a way the size of a row (if you have large values, you could be limited by 
> this factor much before reaching 2 billions columns).
> 
> The number of rows is never recorded anywhere (no data type limit). And rows 
> are balanced over the cluster. So there is no real limit outside what your 
> cluster can handle (that is the number of machine you can afford is probably 
> the limit).
> 
> Now, if a single node holds a huge number of rows, the only factor that comes 
> to mind is that the sparse index kept in memory for the SSTable can start to 
> take too much memory (depending on how much memory you have). In which case 
> you can have a look at index_interval in cassandra.yaml. But as long as you 
> don't start seeing node EOM for no reason, this should not be a concern. 
> 
> --
> Sylvain
> 
> On Wed, Feb 16, 2011 at 9:36 AM, Sasha Dolgy <sdo...@gmail.com> wrote:
>  
> is there a limit or a factor to take into account when the number of rows in 
> a CF exceeds a certain number?  i see the columns for a row can get upwards 
> of 2 billion ... can i have 2 billion rows without much issue?  
> 
> -- 
> Sasha Dolgy
> sasha.do...@gmail.com
> 

Reply via email to