Re: a question on cassandra data file size

Yiming Sun Fri, 30 Mar 2012 09:31:36 -0700

Hi Ed,

the "comp actions" stand for compaction or compression?  Also, the size we
obtained from the supercolumn schema was also taken many days after the
data ingest, so it had to be after compact as well, no?  In neither case we
issued any nodetool compact commands.

you are right that we probably wouldn't have achieved 50% reduction right
off the bat -- I overlooked one detail... when we moved to the regular
column schema, we also removed redundant columns that had to be present in
the supercolumn schema, although the column values are small (ints and
longs), but they could aggregate to quite a bit of storage usage.

so aside from that, it sounds like the reduction in data size is attributed
more to the fact we moved from supercolumns to regular columns, than to the
moving from 0.8 to 1.0.   Thanks!

-- Y.

On Fri, Mar 30, 2012 at 10:18 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

> Standard columns save size over super columns. Not 50% but depending
> on the size of the data (3 byte values) the overhead could be
> significant. I have noticed that post sstable rebuild, 1.0 kicked off
> some comp actions behind the scenes shrinking some files
> significantly.
>
> On Fri, Mar 30, 2012 at 9:01 AM, Yiming Sun <yiming....@gmail.com> wrote:
> > Hi,
> >
> > I have a question on the size of cassandra data files.  After we upgraded
> > from cassandra 0.8 to 1.0, and changed our schema to use regular columns
> > instead of supercolumns, the aggregated size of cassandra data files
> reduced
> > by more than half.  The source data set is the same, and we didn't set
> any
> > compression options in the new schema.
> >
> > The reduction of data file is good, but we still would like to know a
> little
> > more about the reason behind this reduction.  Could someone enlighten me,
> > please?  Thanks.
> >
> > -- Y.
>

Re: a question on cassandra data file size

Reply via email to