Thanks for the detailed explanation Peter! Definitely cleared my doubts !


On Mon, Feb 7, 2011 at 1:52 PM, Peter Schuller
<peter.schul...@infidyne.com> wrote:
>> Does huge variation in no. of columns in rows, over the column family
>> has *any* impact on the performance ?
>>
>> Can I have like just 100 columns in some rows and like hundred
>> thousands of columns in another set of rows, without any downsides ?
>
> If I interpret your question the way I think you mean it, then no,
> Cassandra doesn't "do" anything with the data such that the smaller
> rows are somehow directly less efficient because there are other rows
> that are bigger. It doesn't affect the on-disk format or the on-disk
> efficiency of accessing the rows.
>
> However, there are almost always indirect effects when it comes to
> performance, in and particular storage systems. In the case of
> Cassandra, the *variation* itself should not impose a direct
> performance penalty, but there are potential other effects. For
> example the row cache is only useful for small works, so if you are
> looking to use the row cache the huge rows would perhaps prevent that.
> This could be interpreted as a performance impact on the smaller rows
> by the larger rows.... Compaction may become more expensive due to
> e.g. additional GC pressure resulting from
> large-but-still-within-in-memory-limits rows being compacted (or not,
> depending on JVM/GC settings). There is also the effect of cache
> locality as data set grows, and the cache locality for the smaller
> rows will likely be worse than had they been in e.g. a separate CF.
>
> Those are just three random example; I'm just trying to make the point
> that "without any downsides" is a very strong and blanket requirement
> for making the decision to mix small rows with larger ones.
>
> --
> / Peter Schuller
>

Reply via email to