> Does huge variation in no. of columns in rows, over the column family > has *any* impact on the performance ? > > Can I have like just 100 columns in some rows and like hundred > thousands of columns in another set of rows, without any downsides ?
If I interpret your question the way I think you mean it, then no, Cassandra doesn't "do" anything with the data such that the smaller rows are somehow directly less efficient because there are other rows that are bigger. It doesn't affect the on-disk format or the on-disk efficiency of accessing the rows. However, there are almost always indirect effects when it comes to performance, in and particular storage systems. In the case of Cassandra, the *variation* itself should not impose a direct performance penalty, but there are potential other effects. For example the row cache is only useful for small works, so if you are looking to use the row cache the huge rows would perhaps prevent that. This could be interpreted as a performance impact on the smaller rows by the larger rows.... Compaction may become more expensive due to e.g. additional GC pressure resulting from large-but-still-within-in-memory-limits rows being compacted (or not, depending on JVM/GC settings). There is also the effect of cache locality as data set grows, and the cache locality for the smaller rows will likely be worse than had they been in e.g. a separate CF. Those are just three random example; I'm just trying to make the point that "without any downsides" is a very strong and blanket requirement for making the decision to mix small rows with larger ones. -- / Peter Schuller