yup… that's what I was thinking.. but good point on the physical vs logical row… cassandra should be more rigorous about this term… it just says "large row" not "large physical row"
… Any idea how much this is going to slow me down? On Mon, Jun 30, 2014 at 10:10 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > Hello Kevin. > > With CQL3 there are some important terms to define: > > a. Row : means a logical row in the CQL3 semantics, logical row is what > is displayed as a row in cqlsh client > b. Partition: means a physical row on disk in the CQL3 semantics > > Even if you have tiny logical rows, if you store a lot of them under the > same partition (physical row on disk) then it can add up a lot. > > Quick maths: 200k per logical row * 1000 logical rows = 200Mb roughtly for > the partition > > > On Mon, Jun 30, 2014 at 6:53 PM, Kevin Burton <bur...@spinn3r.com> wrote: > >> I'm running a full compaction now and noticed this: >> >> Compacting large row … incrementally >> >> … and the values were in the 300-500MB range. >> >> I'm storing NOTHING anywhere near that large. Max is about 200k... >> >> However, I'm storing my schema in a way so that I can do efficient >> time/range scans of the data and placing things into buckets. >> >> So my schema looks like: >> >> bucket, >> timestamp >> >> … and the partition key is bucket. Since this is a clustering row, does >> that mean that EVERYTHING is in one "row" under 'bucket' ? >> >> So even though my INSERTs are like 200k, they're all pooling under the >> same 'bucket' which is the partition key so cassandra is going to have a >> hard time compacting them. >> >> Part of the problem here is the serious abuse of vocabulary. The >> thrift/CQL impedance mismatch means that things have slightly different >> names and not-so-straigtforward nomenclature. So it makes it confusing as >> to what's actually happening under the hood. >> >> …. >> >> Then I saw: >> >> >> http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%3cbanlktik0g+epq4ctw28ty+dpexprtis...@mail.gmail.com%3E >> >> >> >> look for in_memory_compaction_limit_in_mb in cassandra.yaml >> >> >> … so this seems like it will be a problem and slow me down moving >> forward. Unless I figure out a workaround. >> >> -- >> >> Founder/CEO Spinn3r.com >> Location: *San Francisco, CA* >> blog: http://burtonator.wordpress.com >> … or check out my Google+ profile >> <https://plus.google.com/102718274791889610666/posts> >> <http://spinn3r.com> >> >> > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com>