Hi Julie -- Keep in mind that there is additional data storage overhead, including timestamps and column names. Because the schema can vary from row to row, the column names are stored with each row, in addition to the data. Disk space-efficiency is not a primary design goal for Cassandra.
Mason On Wed, Jul 7, 2010 at 10:56 AM, Julie <julie.su...@nextcentury.com> wrote: > Hi guys, > I have what may be a dumb question but I am confused by how much disk space > is > being used by my Cassandra nodes. I have 10 nodes in my cluster with a > replication factor of 3. After I write 1,000,000 rows to the database > (100kB > each), I see that they have been distributed very evenly, about 100,000 > rows > per node but because of the replication factor of 3, each node contains > about > 300,000 rows. This is all good. Since my rows are 100kB each, I expect > each > node to store about 30GB of data, however that is not what I am seeing. > Instead, I am seeing some nodes that do not experience any compaction > exceptions but report their space used as MUCH more. Here's one using 106 > GB > of disk. My disks are only 160 GB so this is at the bleeding edge and I > thought my node would be able to store more data. > > I only use a single column family so here is the cfstats output from one of > my > nodes (server5): > > Column Family: Standard1 > SSTable count: 12 > Space used (live): 113946099884 > Space used (total): 113946099884 > Memtable Columns Count: 0 > Memtable Data Size: 0 > Memtable Switch Count: 451 > Read Count: 31786 > Read Latency: 161.429 ms. > Write Count: 300633 > Write Latency: 0.124 ms. > Pending Tasks: 0 > Key cache: disabled > Row cache capacity: 3000 > Row cache size: 3000 > Row cache hit rate: 0.38331340841880074 > Compacted row minimum size: 100220 > Compacted row maximum size: 100225 > Compacted row mean size: 100224 > > Note that I wrote these 1M rows of data yesterday and the system has had 24 > hours to digest it. There are no exceptions in the system.log file. Here's > the tail end of it: > > ... > INFO [SSTABLE-CLEANUP-TIMER] 2010-07-06 16:13:43,162 > SSTableDeletingReference.java (line 104) Deleted > /var/lib/cassandra/data/Keyspace1/Standard1-430-Data.db > INFO [SSTABLE-CLEANUP-TIMER] 2010-07-06 16:13:43,269 > SSTableDeletingReference.java (line 104) Deleted > /var/lib/cassandra/data/Keyspace1/Standard1-445-Data.db > INFO [COMPACTION-POOL:1] 2010-07-06 16:35:21,718 CompactionManager.java > (line > 246) Compacting [] > INFO [Timer-1] 2010-07-06 17:01:01,907 Gossiper.java (line 179) > InetAddress > /10.248.107.19 is now dead. > INFO [GMFD:1] 2010-07-06 17:01:42,039 Gossiper.java (line 568) InetAddress > /10.248.107.19 is now UP > INFO [COMPACTION-POOL:1] 2010-07-06 17:35:21,306 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-06 18:35:20,802 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-06 19:35:20,389 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-06 20:35:19,934 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-06 21:35:19,582 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-06 22:35:19,233 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-06 23:35:18,593 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-07 00:35:18,076 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-07 01:35:17,673 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-07 02:35:17,172 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-07 03:35:16,784 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-07 04:35:16,383 CompactionManager.java > (line > 246) Compacting [] > > Thank you for your help!! > Julie > > >