What are you storing these 15 chars as; string, int, double, etc.? 15 chars
does not translate to 15 bytes.

You may be mixing up replication factor and quorum when you say "Cassandra
cluster has 3 servers, and data is stored in quorum ( 2 servers )." You
read and write at quorum (N/2)+1 where N=total_number_of_nodes and your
data is replicated to the number of nodes you specify in your replication
factor. Could you clarify?

Also if you are concerned about disk usage, why are you storing the same 15
char value in both the column name and value? You could just store it as
the name and half your data usage :)




On Sun, Apr 13, 2014 at 4:26 PM, Yulian Oifa <oifa.yul...@gmail.com> wrote:

> I have column family with 2 raws.
> 2 raws have overall 100 million columns.
> Each columns have name of 15 chars ( digits ) and same 15 chars in value (
> also digits ).
> Each column should have 30 bytes.
> Therefore all data should contain approximately 3GB.
> Cassandra cluster has 3 servers , and data is stored in quorum ( 2 servers
> ).
> Therefore each server should have 3GB*2/3=2GB of data for this column
> family.
> Table is almost never changed , data is only removed from this table ,
> which possibly created tombstones , but it should not increase the usage.
> However when i check the data i see that each server has more then 4GB of
> data ( more then twice of what should be).
>
> server 1:
> -rw-r--r-- 1 root root 3506446057 Dec 26 12:02 freeNumbers-g-264-Data.db
> -rw-r--r-- 1 root root  814699666 Dec 26 12:24 freeNumbers-g-281-Data.db
> -rw-r--r-- 1 root root  198432466 Dec 26 12:27 freeNumbers-g-284-Data.db
> -rw-r--r-- 1 root root   35883918 Apr 12 20:07 freeNumbers-g-336-Data.db
>
> server 2:
> -rw-r--r-- 1 root root 3448432307 Dec 26 11:57 freeNumbers-g-285-Data.db
> -rw-r--r-- 1 root root  762399716 Dec 26 12:22 freeNumbers-g-301-Data.db
> -rw-r--r-- 1 root root  220887062 Dec 26 12:23 freeNumbers-g-304-Data.db
> -rw-r--r-- 1 root root   54914466 Dec 26 12:26 freeNumbers-g-306-Data.db
> -rw-r--r-- 1 root root   53639516 Dec 26 12:26 freeNumbers-g-305-Data.db
> -rw-r--r-- 1 root root   53007967 Jan  8 15:45 freeNumbers-g-314-Data.db
> -rw-r--r-- 1 root root     413717 Apr 12 18:33 freeNumbers-g-359-Data.db
>
>
> server 3:
> -rw-r--r-- 1 root root 4490657264 Apr 11 18:20 freeNumbers-g-358-Data.db
> -rw-r--r-- 1 root root     389171 Apr 12 20:58 freeNumbers-g-360-Data.db
> -rw-r--r-- 1 root root       4276 Apr 11 18:20
> freeNumbers-g-358-Statistics.db
> -rw-r--r-- 1 root root       4276 Apr 11 18:24
> freeNumbers-g-359-Statistics.db
> -rw-r--r-- 1 root root       4276 Apr 12 20:58
> freeNumbers-g-360-Statistics.db
> -rw-r--r-- 1 root root        976 Apr 11 18:20 freeNumbers-g-358-Filter.db
> -rw-r--r-- 1 root root        208 Apr 11 18:24 freeNumbers-g-359-Data.db
> -rw-r--r-- 1 root root         78 Apr 11 18:20 freeNumbers-g-358-Index.db
> -rw-r--r-- 1 root root         52 Apr 11 18:24 freeNumbers-g-359-Index.db
> -rw-r--r-- 1 root root         52 Apr 12 20:58 freeNumbers-g-360-Index.db
> -rw-r--r-- 1 root root         16 Apr 11 18:24 freeNumbers-g-359-Filter.db
> -rw-r--r-- 1 root root         16 Apr 12 20:58 freeNumbers-g-360-Filter.db
>
> When i try to compact i get the following notification from first server :
> INFO [CompactionExecutor:1604] 2014-04-13 18:23:07,260
> CompactionController.java (line 146) Compacting large row
> USER_DATA/freeNumbers:8bdf9678-6d70-11e3-85ab-80e385abf85d (4555076689
> bytes) incrementally
>
> Which confirms that there is around 4.5GB of data on that server only.
> Why does cassandra takes so much data???
>
> Best regards
> Yulian Oifa
>
>

Reply via email to