I'm running a full compaction now and noticed this:

Compacting large row … incrementally

… and the values were in the 300-500MB range.

I'm storing NOTHING anywhere near that large.  Max is about 200k...

However, I'm storing my schema in a way so that I can do efficient
time/range scans of the data and placing things into buckets.

So my schema looks like:

bucket,
timestamp

… and the partition key is bucket.  Since this is a clustering row, does
that mean that EVERYTHING is in one "row" under 'bucket' ?

So even though my INSERTs are like 200k, they're all pooling under the same
'bucket' which is the partition key so cassandra is going to have a hard
time compacting them.

Part of the problem here is the serious abuse of vocabulary.  The
thrift/CQL impedance mismatch means that things have slightly different
names and not-so-straigtforward nomenclature.  So it makes it confusing as
to what's actually happening under the hood.

….

Then I saw:

http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%3cbanlktik0g+epq4ctw28ty+dpexprtis...@mail.gmail.com%3E


look for in_memory_compaction_limit_in_mb in cassandra.yaml


… so this seems like it will be a problem and slow me down moving forward.
 Unless I figure out a workaround.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Reply via email to