I have column family with 2 raws.
2 raws have overall 100 million columns.
Each columns have name of 15 chars ( digits ) and same 15 chars in value (
also digits ).
Each column should have 30 bytes.
Therefore all data should contain approximately 3GB.
Cassandra cluster has 3 servers , and data is stored in quorum ( 2 servers
).
Therefore each server should have 3GB*2/3=2GB of data for this column
family.
Table is almost never changed , data is only removed from this table ,
which possibly created tombstones , but it should not increase the usage.
However when i check the data i see that each server has more then 4GB of
data ( more then twice of what should be).

server 1:
-rw-r--r-- 1 root root 3506446057 Dec 26 12:02 freeNumbers-g-264-Data.db
-rw-r--r-- 1 root root  814699666 Dec 26 12:24 freeNumbers-g-281-Data.db
-rw-r--r-- 1 root root  198432466 Dec 26 12:27 freeNumbers-g-284-Data.db
-rw-r--r-- 1 root root   35883918 Apr 12 20:07 freeNumbers-g-336-Data.db

server 2:
-rw-r--r-- 1 root root 3448432307 Dec 26 11:57 freeNumbers-g-285-Data.db
-rw-r--r-- 1 root root  762399716 Dec 26 12:22 freeNumbers-g-301-Data.db
-rw-r--r-- 1 root root  220887062 Dec 26 12:23 freeNumbers-g-304-Data.db
-rw-r--r-- 1 root root   54914466 Dec 26 12:26 freeNumbers-g-306-Data.db
-rw-r--r-- 1 root root   53639516 Dec 26 12:26 freeNumbers-g-305-Data.db
-rw-r--r-- 1 root root   53007967 Jan  8 15:45 freeNumbers-g-314-Data.db
-rw-r--r-- 1 root root     413717 Apr 12 18:33 freeNumbers-g-359-Data.db


server 3:
-rw-r--r-- 1 root root 4490657264 Apr 11 18:20 freeNumbers-g-358-Data.db
-rw-r--r-- 1 root root     389171 Apr 12 20:58 freeNumbers-g-360-Data.db
-rw-r--r-- 1 root root       4276 Apr 11 18:20
freeNumbers-g-358-Statistics.db
-rw-r--r-- 1 root root       4276 Apr 11 18:24
freeNumbers-g-359-Statistics.db
-rw-r--r-- 1 root root       4276 Apr 12 20:58
freeNumbers-g-360-Statistics.db
-rw-r--r-- 1 root root        976 Apr 11 18:20 freeNumbers-g-358-Filter.db
-rw-r--r-- 1 root root        208 Apr 11 18:24 freeNumbers-g-359-Data.db
-rw-r--r-- 1 root root         78 Apr 11 18:20 freeNumbers-g-358-Index.db
-rw-r--r-- 1 root root         52 Apr 11 18:24 freeNumbers-g-359-Index.db
-rw-r--r-- 1 root root         52 Apr 12 20:58 freeNumbers-g-360-Index.db
-rw-r--r-- 1 root root         16 Apr 11 18:24 freeNumbers-g-359-Filter.db
-rw-r--r-- 1 root root         16 Apr 12 20:58 freeNumbers-g-360-Filter.db

When i try to compact i get the following notification from first server :
INFO [CompactionExecutor:1604] 2014-04-13 18:23:07,260
CompactionController.java (line 146) Compacting large row
USER_DATA/freeNumbers:8bdf9678-6d70-11e3-85ab-80e385abf85d (4555076689
bytes) incrementally

Which confirms that there is around 4.5GB of data on that server only.
Why does cassandra takes so much data???

Best regards
Yulian Oifa

Reply via email to