Check your log for messages about rebuilding indices: that might grow your dataset some.
One thing is for sure: the data import removed all the crap that lasted in the 0.8.1 cluster (duplicates, thombstones etc). The decrease is fairly dramatic but not unlogical at all. 2012/3/16 Jeremiah Jordan <jeremiah.jor...@morningstar.com> > I would guess more aggressive compaction settings, did you update rows > or insert some twice? > If you run major compaction a couple times on the 0.8.1 cluster does the > data size get smaller? > > You can use the "describe" command to check if compression got turned on. > > -Jeremiah > > ------------------------------ > *From:* Ravikumar Govindarajan [ravikumar.govindara...@gmail.com] > *Sent:* Thursday, March 15, 2012 4:41 AM > *To:* user@cassandra.apache.org > *Subject:* 0.8.1 Vs 1.0.7 > > Hi, > > I ran some data import tests for cassandra 0.8.1 and 1.0.7. The results > were a little bit surprising > > 0.8.1, SimpleStrategy, Rep_Factor=3,QUORUM Writes, RP, SimpleSnitch > > XXX.XXX.XXX.A datacenter1 rack1 Up Normal 140.61 GB > 12.50% > XXX.XXX.XXX.B datacenter1 rack1 Up Normal 139.92 GB > 12.50% > XXX.XXX.XXX.C datacenter1 rack1 Up Normal 138.81 GB > 12.50% > XXX.XXX.XXX.D datacenter1 rack1 Up Normal 139.78 GB > 12.50% > XXX.XXX.XXX.E datacenter1 rack1 Up Normal 137.44 GB > 12.50% > XXX.XXX.XXX.F datacenter1 rack1 Up Normal 138.48 GB > 12.50% > XXX.XXX.XXX.G datacenter1 rack1 Up Normal 140.52 GB > 12.50% > XXX.XXX.XXX.H datacenter1 rack1 Up Normal 145.24 GB > 12.50% > > 1.0.7, NTS, Rep_Factor{DC1:3, DC2:2}, LOCAL_QUORUM writes, RP [DC2 m/c > yet to join ring], > PropertyFileSnitch > > XXX.XXX.XXX.A DC1 RAC1 Up Normal 48.72 GB 12.50% > XXX.XXX.XXX.B DC1 RAC1 Up Normal 51.23 GB 12.50% > XXX.XXX.XXX.C DC1 RAC1 Up Normal 52.4 GB 12.50% > > XXX.XXX.XXX.D DC1 RAC1 Up Normal 49.64 GB 12.50% > XXX.XXX.XXX.E DC1 RAC1 Up Normal 48.5 GB 12.50% > > XXX.XXX.XXX.F DC1 RAC1 Up Normal 53.38 GB 12.50% > > XXX.XXX.XXX.G DC1 RAC1 Up Normal 51.11 GB 12.50% > XXX.XXX.XXX.H DC1 RAC1 Up Normal 53.36 GB 12.50% > > There seems to be 3X savings in size for the same dataset running 1.0.7. > I have not enabled compression for any of the CFs. Will it be enabled by > default when creating a new CF in 1.0.7? cassandra.yaml is also mostly > identical. > > Thanks and Regards, > Ravi >