Thanks for the detailed explanation Peter! Definitely cleared my doubts !
On Mon, Feb 7, 2011 at 1:52 PM, Peter Schuller <peter.schul...@infidyne.com> wrote: >> Does huge variation in no. of columns in rows, over the column family >> has *any* impact on the performance ? >> >> Can I have like just 100 columns in some rows and like hundred >> thousands of columns in another set of rows, without any downsides ? > > If I interpret your question the way I think you mean it, then no, > Cassandra doesn't "do" anything with the data such that the smaller > rows are somehow directly less efficient because there are other rows > that are bigger. It doesn't affect the on-disk format or the on-disk > efficiency of accessing the rows. > > However, there are almost always indirect effects when it comes to > performance, in and particular storage systems. In the case of > Cassandra, the *variation* itself should not impose a direct > performance penalty, but there are potential other effects. For > example the row cache is only useful for small works, so if you are > looking to use the row cache the huge rows would perhaps prevent that. > This could be interpreted as a performance impact on the smaller rows > by the larger rows.... Compaction may become more expensive due to > e.g. additional GC pressure resulting from > large-but-still-within-in-memory-limits rows being compacted (or not, > depending on JVM/GC settings). There is also the effect of cache > locality as data set grows, and the cache locality for the smaller > rows will likely be worse than had they been in e.g. a separate CF. > > Those are just three random example; I'm just trying to make the point > that "without any downsides" is a very strong and blanket requirement > for making the decision to mix small rows with larger ones. > > -- > / Peter Schuller >