Hi Ed, the "comp actions" stand for compaction or compression? Also, the size we obtained from the supercolumn schema was also taken many days after the data ingest, so it had to be after compact as well, no? In neither case we issued any nodetool compact commands.
you are right that we probably wouldn't have achieved 50% reduction right off the bat -- I overlooked one detail... when we moved to the regular column schema, we also removed redundant columns that had to be present in the supercolumn schema, although the column values are small (ints and longs), but they could aggregate to quite a bit of storage usage. so aside from that, it sounds like the reduction in data size is attributed more to the fact we moved from supercolumns to regular columns, than to the moving from 0.8 to 1.0. Thanks! -- Y. On Fri, Mar 30, 2012 at 10:18 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > Standard columns save size over super columns. Not 50% but depending > on the size of the data (3 byte values) the overhead could be > significant. I have noticed that post sstable rebuild, 1.0 kicked off > some comp actions behind the scenes shrinking some files > significantly. > > On Fri, Mar 30, 2012 at 9:01 AM, Yiming Sun <yiming....@gmail.com> wrote: > > Hi, > > > > I have a question on the size of cassandra data files. After we upgraded > > from cassandra 0.8 to 1.0, and changed our schema to use regular columns > > instead of supercolumns, the aggregated size of cassandra data files > reduced > > by more than half. The source data set is the same, and we didn't set > any > > compression options in the new schema. > > > > The reduction of data file is good, but we still would like to know a > little > > more about the reason behind this reduction. Could someone enlighten me, > > please? Thanks. > > > > -- Y. >