On Wed, Jan 5, 2011 at 9:52 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > It's normal for Cassandra to use more disk space than MySQL. It's > part of what we trade for not having to rewrite every row when you add > a new column. > > "SSTables that are obsoleted by a compaction are deleted > asynchronously when the JVM performs a GC." > http://wiki.apache.org/cassandra/MemtableSSTable > > On Wed, Jan 5, 2011 at 8:35 AM, nicolas lattuada > <nicolaslattu...@hotmail.fr> wrote: >> Hi >> >> i have some data size issues: >> >> i am storing super columns with the following content: >> >> {a=>1, b=>2, c=>3.......n=>14} >> >> i am storing it 300 000 times and i have a data size on the disk about 283Mo >> >> And in other side i have a mysql table which stores a bunch of data the >> schema follows: >> 6 varchars +100 >> 5 ints +6 >> >> I put about 1 300 000 records on it and end up with 150Mo of data and 57Mo >> of index. >> >> Then i think i am certainly doing something wrong... >> >> The other thing is when i run flush and then compact the size of my data >> increases, then i imagine something is copied up on compaction >> So is there a way to remove the unused data? (cleanup doesn t seem to do the >> job). >> >> Any help to reduce the size of the data would be greatly apreciated! >> Greetings >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Unlike datastores that are delimited or have fixed column sizes Cassandra does not. Each row is a Sorted Map of columns. A Column is a tupple of {columnname,columnvalue,time}. Also the data is not stored as tersely as it is inside mysql.