I have column family with 2 raws. 2 raws have overall 100 million columns. Each columns have name of 15 chars ( digits ) and same 15 chars in value ( also digits ). Each column should have 30 bytes. Therefore all data should contain approximately 3GB. Cassandra cluster has 3 servers , and data is stored in quorum ( 2 servers ). Therefore each server should have 3GB*2/3=2GB of data for this column family. Table is almost never changed , data is only removed from this table , which possibly created tombstones , but it should not increase the usage. However when i check the data i see that each server has more then 4GB of data ( more then twice of what should be).
server 1: -rw-r--r-- 1 root root 3506446057 Dec 26 12:02 freeNumbers-g-264-Data.db -rw-r--r-- 1 root root 814699666 Dec 26 12:24 freeNumbers-g-281-Data.db -rw-r--r-- 1 root root 198432466 Dec 26 12:27 freeNumbers-g-284-Data.db -rw-r--r-- 1 root root 35883918 Apr 12 20:07 freeNumbers-g-336-Data.db server 2: -rw-r--r-- 1 root root 3448432307 Dec 26 11:57 freeNumbers-g-285-Data.db -rw-r--r-- 1 root root 762399716 Dec 26 12:22 freeNumbers-g-301-Data.db -rw-r--r-- 1 root root 220887062 Dec 26 12:23 freeNumbers-g-304-Data.db -rw-r--r-- 1 root root 54914466 Dec 26 12:26 freeNumbers-g-306-Data.db -rw-r--r-- 1 root root 53639516 Dec 26 12:26 freeNumbers-g-305-Data.db -rw-r--r-- 1 root root 53007967 Jan 8 15:45 freeNumbers-g-314-Data.db -rw-r--r-- 1 root root 413717 Apr 12 18:33 freeNumbers-g-359-Data.db server 3: -rw-r--r-- 1 root root 4490657264 Apr 11 18:20 freeNumbers-g-358-Data.db -rw-r--r-- 1 root root 389171 Apr 12 20:58 freeNumbers-g-360-Data.db -rw-r--r-- 1 root root 4276 Apr 11 18:20 freeNumbers-g-358-Statistics.db -rw-r--r-- 1 root root 4276 Apr 11 18:24 freeNumbers-g-359-Statistics.db -rw-r--r-- 1 root root 4276 Apr 12 20:58 freeNumbers-g-360-Statistics.db -rw-r--r-- 1 root root 976 Apr 11 18:20 freeNumbers-g-358-Filter.db -rw-r--r-- 1 root root 208 Apr 11 18:24 freeNumbers-g-359-Data.db -rw-r--r-- 1 root root 78 Apr 11 18:20 freeNumbers-g-358-Index.db -rw-r--r-- 1 root root 52 Apr 11 18:24 freeNumbers-g-359-Index.db -rw-r--r-- 1 root root 52 Apr 12 20:58 freeNumbers-g-360-Index.db -rw-r--r-- 1 root root 16 Apr 11 18:24 freeNumbers-g-359-Filter.db -rw-r--r-- 1 root root 16 Apr 12 20:58 freeNumbers-g-360-Filter.db When i try to compact i get the following notification from first server : INFO [CompactionExecutor:1604] 2014-04-13 18:23:07,260 CompactionController.java (line 146) Compacting large row USER_DATA/freeNumbers:8bdf9678-6d70-11e3-85ab-80e385abf85d (4555076689 bytes) incrementally Which confirms that there is around 4.5GB of data on that server only. Why does cassandra takes so much data??? Best regards Yulian Oifa