This isn't quite true, I think. RandomPartitioner uses MD5. So if you had 10^16 rows, you would have a 10^-6 chance of a collision, according to http://en.wikipedia.org/wiki/Birthday_attack ... and apparently MD5 isn't quite balanced, so your actual odds of a collision are worse (though I'm not familiar with the literature).
10^16 is very large... but conceivable, I guess. -- Shaun On Feb 16, 2011, at 4:05 AM, Sylvain Lebresne wrote: > Sky is the limit. > > Columns in a row are limited to 2 billion because the size of a row is > recorded in a java int. A row must also fit on one node, so this also limit > in a way the size of a row (if you have large values, you could be limited by > this factor much before reaching 2 billions columns). > > The number of rows is never recorded anywhere (no data type limit). And rows > are balanced over the cluster. So there is no real limit outside what your > cluster can handle (that is the number of machine you can afford is probably > the limit). > > Now, if a single node holds a huge number of rows, the only factor that comes > to mind is that the sparse index kept in memory for the SSTable can start to > take too much memory (depending on how much memory you have). In which case > you can have a look at index_interval in cassandra.yaml. But as long as you > don't start seeing node EOM for no reason, this should not be a concern. > > -- > Sylvain > > On Wed, Feb 16, 2011 at 9:36 AM, Sasha Dolgy <sdo...@gmail.com> wrote: > > is there a limit or a factor to take into account when the number of rows in > a CF exceeds a certain number? i see the columns for a row can get upwards > of 2 billion ... can i have 2 billion rows without much issue? > > -- > Sasha Dolgy > sasha.do...@gmail.com >