We are having problems with repair too. 

It sounds like yours are the same. From today:
http://permalink.gmane.org/gmane.comp.db.cassandra.user/16619

On May 25, 2011, at 4:52 PM, Dominic Williams wrote:

> Hi,
> 
> I've got a strange problem, where the database on a node has inflated 10X 
> after running repair. This is not the result of receiving missed data.
> 
> I didn't perform repair within my usual 10 day cycle, so followed recommended 
> practice:
> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
> 
> The sequence of events was like this:
> 
> 1) set GCGraceSeconds to some huge value
> 2) perform rolling upgrade from 0.7.4 to 0.7.6-2
> 3) run nodetool repair on the first node in cluster ~10pm. It has a ~30G 
> database
> 3) 2.30am decide to leave it running all night and wake up 9am to find still 
> running
> 4) late morning investigation shows that db size has increased to 370G. The 
> snapshot folder accounts for only 30G
> 5) node starts to run out of disk space http://pastebin.com/Sm0B7nfR
> 6) decide to bail! Reset GCGraceSeconds to 864000 and restart node to stop 
> repair
> 7) as node restarts it deletes a bunch of tmp files, reducing db size from 
> 370G to 270G
> 8) node now constantly performing minor compactions and du rising slightly 
> then falling by a greater amount after minor compaction deletes sstable
> 9) gradually disk usage is coming down. Currently at 254G (3pm)
> 10) performance of node obviously not great!
> 
> Investigation of the database reveals the main problem to have occurred in a 
> single column family, UserFights. This contains millions of fight records 
> from our MMO, but actually exactly the same number as the MonsterFights cf. 
> However, the comparative size is
> 
>               Column Family: MonsterFights
>               SSTable count: 38
>               Space used (live): 13867454647
>               Space used (total): 13867454647 (13G)
>               Memtable Columns Count: 516
>               Memtable Data Size: 598770
>               Memtable Switch Count: 4
>               Read Count: 514
>               Read Latency: 157.649 ms.
>               Write Count: 4059
>               Write Latency: 0.025 ms.
>               Pending Tasks: 0
>               Key cache capacity: 200000
>               Key cache size: 183004
>               Key cache hit rate: 0.0023566218452145135
>               Row cache: disabled
>               Compacted row minimum size: 771
>               Compacted row maximum size: 943127
>               Compacted row mean size: 3208
> 
>               Column Family: UserFights
>               SSTable count: 549
>               Space used (live): 185355019679
>               Space used (total): 219489031691 (219G)
>               Memtable Columns Count: 483
>               Memtable Data Size: 560569
>               Memtable Switch Count: 8
>               Read Count: 2159
>               Read Latency: 2589.150 ms.
>               Write Count: 4080
>               Write Latency: 0.018 ms.
>               Pending Tasks: 0
>               Key cache capacity: 200000
>               Key cache size: 200000
>               Key cache hit rate: 0.03357770764288416
>               Row cache: disabled
>               Compacted row minimum size: 925
>               Compacted row maximum size: 12108970
>               Compacted row mean size: 503069
> 
> These stats were taken at 3pm, and at 1pm UserFights was using 224G total, so 
> overall size is gradually coming down. 
> 
> Another observation is the following appearing in the logs during the minor 
> compactions:
> Compacting large row 536c69636b5061756c (121235810 bytes) incrementally
> 
> The largest number of fights any user has performed on our MMO that I can 
> find is short of 10,000. Each fight record is smaller than 1K... so it looks 
> like these rows have grown +10X somehow.
> 
> The size of UserFights on another replica node, which actually has a slightly 
> higher proportion of ring is
> 
>               Column Family: UserFights
>               SSTable count: 14
>               Space used (live): 17844982744
>               Space used (total): 17936528583 (18G)
>               Memtable Columns Count: 767
>               Memtable Data Size: 891153
>               Memtable Switch Count: 6
>               Read Count: 2298
>               Read Latency: 61.020 ms.
>               Write Count: 4261
>               Write Latency: 0.104 ms.
>               Pending Tasks: 0
>               Key cache capacity: 200000
>               Key cache size: 55172
>               Key cache hit rate: 0.8079570484581498
>               Row cache: disabled
>               Compacted row minimum size: 925
>               Compacted row maximum size: 12108970
>               Compacted row mean size: 846477
> ...
> 
> All ideas and suggestions greatly appreciated as always!
> 
> Dominic
> ria101.wordpress.com

Reply via email to