What are you storing these 15 chars as; string, int, double, etc.? 15 chars does not translate to 15 bytes.
You may be mixing up replication factor and quorum when you say "Cassandra cluster has 3 servers, and data is stored in quorum ( 2 servers )." You read and write at quorum (N/2)+1 where N=total_number_of_nodes and your data is replicated to the number of nodes you specify in your replication factor. Could you clarify? Also if you are concerned about disk usage, why are you storing the same 15 char value in both the column name and value? You could just store it as the name and half your data usage :) On Sun, Apr 13, 2014 at 4:26 PM, Yulian Oifa <oifa.yul...@gmail.com> wrote: > I have column family with 2 raws. > 2 raws have overall 100 million columns. > Each columns have name of 15 chars ( digits ) and same 15 chars in value ( > also digits ). > Each column should have 30 bytes. > Therefore all data should contain approximately 3GB. > Cassandra cluster has 3 servers , and data is stored in quorum ( 2 servers > ). > Therefore each server should have 3GB*2/3=2GB of data for this column > family. > Table is almost never changed , data is only removed from this table , > which possibly created tombstones , but it should not increase the usage. > However when i check the data i see that each server has more then 4GB of > data ( more then twice of what should be). > > server 1: > -rw-r--r-- 1 root root 3506446057 Dec 26 12:02 freeNumbers-g-264-Data.db > -rw-r--r-- 1 root root 814699666 Dec 26 12:24 freeNumbers-g-281-Data.db > -rw-r--r-- 1 root root 198432466 Dec 26 12:27 freeNumbers-g-284-Data.db > -rw-r--r-- 1 root root 35883918 Apr 12 20:07 freeNumbers-g-336-Data.db > > server 2: > -rw-r--r-- 1 root root 3448432307 Dec 26 11:57 freeNumbers-g-285-Data.db > -rw-r--r-- 1 root root 762399716 Dec 26 12:22 freeNumbers-g-301-Data.db > -rw-r--r-- 1 root root 220887062 Dec 26 12:23 freeNumbers-g-304-Data.db > -rw-r--r-- 1 root root 54914466 Dec 26 12:26 freeNumbers-g-306-Data.db > -rw-r--r-- 1 root root 53639516 Dec 26 12:26 freeNumbers-g-305-Data.db > -rw-r--r-- 1 root root 53007967 Jan 8 15:45 freeNumbers-g-314-Data.db > -rw-r--r-- 1 root root 413717 Apr 12 18:33 freeNumbers-g-359-Data.db > > > server 3: > -rw-r--r-- 1 root root 4490657264 Apr 11 18:20 freeNumbers-g-358-Data.db > -rw-r--r-- 1 root root 389171 Apr 12 20:58 freeNumbers-g-360-Data.db > -rw-r--r-- 1 root root 4276 Apr 11 18:20 > freeNumbers-g-358-Statistics.db > -rw-r--r-- 1 root root 4276 Apr 11 18:24 > freeNumbers-g-359-Statistics.db > -rw-r--r-- 1 root root 4276 Apr 12 20:58 > freeNumbers-g-360-Statistics.db > -rw-r--r-- 1 root root 976 Apr 11 18:20 freeNumbers-g-358-Filter.db > -rw-r--r-- 1 root root 208 Apr 11 18:24 freeNumbers-g-359-Data.db > -rw-r--r-- 1 root root 78 Apr 11 18:20 freeNumbers-g-358-Index.db > -rw-r--r-- 1 root root 52 Apr 11 18:24 freeNumbers-g-359-Index.db > -rw-r--r-- 1 root root 52 Apr 12 20:58 freeNumbers-g-360-Index.db > -rw-r--r-- 1 root root 16 Apr 11 18:24 freeNumbers-g-359-Filter.db > -rw-r--r-- 1 root root 16 Apr 12 20:58 freeNumbers-g-360-Filter.db > > When i try to compact i get the following notification from first server : > INFO [CompactionExecutor:1604] 2014-04-13 18:23:07,260 > CompactionController.java (line 146) Compacting large row > USER_DATA/freeNumbers:8bdf9678-6d70-11e3-85ab-80e385abf85d (4555076689 > bytes) incrementally > > Which confirms that there is around 4.5GB of data on that server only. > Why does cassandra takes so much data??? > > Best regards > Yulian Oifa > >