I am chasing down a row size discrepancy and am confused.

I populated a single node Cassandra cluster with 10,000 rows of data, using 
numeric keys 1-10,000, where each row is a little over 100kB in length and has 
a single column in it. 

When I perform a cfstats on the node immediately after writing the data, it 
reports that the Compacted row minimum size = Compacted row maximum size which 
is a little over 100,000 bytes.  This is what I expect.  

I then run an application that randomly reads rows and adds a timestamp column 
to each row read.  This timestamp column name and column value is just adding 
a few bytes to the row.

But after running my reading app for a few hours, cfstats reports a very odd 
minimum row size (and compacted mean row size):

[r...@ec2-server1 ~]# /mnt/server/apache-cassandra-0.6.2/bin/nodetool -h 
ec2-server1 -p 8080 cfstats
Keyspace: Keyspace1
        Read Count: 670434
        Read Latency: 36.22349047035205 ms.
        Write Count: 1519933
        Write Latency: 0.02940705741634664 ms.
        Pending Tasks: 0
                Column Family: Standard1
                SSTable count: 6
                Space used (live): 11130225642
                Space used (total): 11130225642
                Memtable Columns Count: 1435
                Memtable Data Size: 40344907
                Memtable Switch Count: 1329
                Read Count: 670434
                Read Latency: 41.768 ms.
                Write Count: 1519933
                Write Latency: 0.025 ms.
                Pending Tasks: 0
                Key cache capacity: 200000
                Key cache size: 200000
                Key cache hit rate: 0.48049934471509675
                Row cache: disabled
                Compacted row minimum size: 238
                Compacted row maximum size: 100323
                Compacted row mean size: 67548

I thought I had a bug in my code so I wrote another app to read every row 
in the database, keys 1-10,000.  I get the size of each row after reading it 
(by adding up all column names and column values in the row and the size of 
the key string) and this matches what I expect -- every single key in this 
table has a size of just over 100,000 bytes.  (I know there are some 
overhead columns in each row but I assume these will only make the row 
larger, not smaller.)

So I am confused about where cfstats is getting the row sizes it is working 
with?  

When I add the timestamp column to each row, I am not deleting the other 
column (large) in the row but I am not rewriting the large column either.

Thanks for your help!
Julie


Reply via email to