Re: Database grows 10X bigger after running nodetool repair

jonathan . colby Wed, 25 May 2011 08:06:24 -0700

I'm not sure if this is the absolute best advice, but perhapsrunning "clean" on the data will help cleanup any data that isn't assignedto this token - in case you've moved the cluster around before.

Any exceptions in the logs, eg EOF ? I experienced this and it caused therepairs to trip up every time. It was fixed with a "scrub" which rebuildsall the tables.

I also turned swap off on my nodes, which is unnecessary overhead sincemmap manages the virtual memory pretty good.

Be careful about running "major compactions". You'll keep fusing all theData into bigger and bigger files, which are harder to perform maintenancetasks on in my experience.


Jon

On , Dominic Williams <thedwilli...@gmail.com> wrote:

Hi,

I've got a strange problem, where the database on a node has inflated 10Xafter running repair. This is not the result of receiving missed data.

I didn't perform repair within my usual 10 day cycle, so followedrecommended practice:
http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds

The sequence of events was like this:

1) set GCGraceSeconds to some huge value
2) perform rolling upgrade from 0.7.4 to 0.7.6-2
3) run nodetool repair on the first node in cluster ~10pm. It has a ~30Gdatabase

3) 2.30am decide to leave it running all night and wake up 9am to findstill running4) late morning investigation shows that db size has increased to 370G.The snapshot folder accounts for only 30G

5) node starts to run out of disk space http://pastebin.com/Sm0B7nfR
6) decide to bail! Reset GCGraceSeconds to 864000 and restart node tostop repair

7) as node restarts it deletes a bunch of tmp files, reducing db sizefrom 370G to 270G8) node now constantly performing minor compactions and du risingslightly then falling by a greater amount after minor compaction deletessstable

9) gradually disk usage is coming down. Currently at 254G (3pm)
10) performance of node obviously not great!

Investigation of the database reveals the main problem to have occurredin a single column family, UserFights. This contains millions of fightrecords from our MMO, but actually exactly the same number as theMonsterFights cf. However, the comparative size is

Column Family: MonsterFights
SSTable count: 38
Space used (live): 13867454647

Space used (total): 13867454647 (13G)
Memtable Columns Count: 516
Memtable Data Size: 598770

Memtable Switch Count: 4
Read Count: 514
Read Latency: 157.649 ms.

Write Count: 4059
Write Latency: 0.025 ms.
Pending Tasks: 0

Key cache capacity: 200000
Key cache size: 183004
Key cache hit rate: 0.0023566218452145135

Row cache: disabled
Compacted row minimum size: 771
Compacted row maximum size: 943127

Compacted row mean size: 3208

Column Family: UserFights
SSTable count: 549

Space used (live): 185355019679
Space used (total): 219489031691 (219G)
Memtable Columns Count: 483

Memtable Data Size: 560569
Memtable Switch Count: 8
Read Count: 2159

Read Latency: 2589.150 ms.
Write Count: 4080
Write Latency: 0.018 ms.

Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 200000

Key cache hit rate: 0.03357770764288416
Row cache: disabled
Compacted row minimum size: 925

Compacted row maximum size: 12108970
Compacted row mean size: 503069

These stats were taken at 3pm, and at 1pm UserFights was using 224Gtotal, so overall size is gradually coming down.

Another observation is the following appearing in the logs during theminor compactions:
Compacting large row 536c69636b5061756c (121235810 bytes) incrementally

The largest number of fights any user has performed on our MMO that I canfind is short of 10,000. Each fight record is smaller than 1K... so itlooks like these rows have grown +10X somehow.

The size of UserFights on another replica node, which actually has aslightly higher proportion of ring is

Column Family: UserFights

SSTable count: 14
Space used (live): 17844982744
Space used (total): 17936528583 (18G)

Memtable Columns Count: 767
Memtable Data Size: 891153
Memtable Switch Count: 6

Read Count: 2298
Read Latency: 61.020 ms.
Write Count: 4261

Write Latency: 0.104 ms.
Pending Tasks: 0
Key cache capacity: 200000

Key cache size: 55172
Key cache hit rate: 0.8079570484581498
Row cache: disabled

Compacted row minimum size: 925
Compacted row maximum size: 12108970
Compacted row mean size: 846477

...

All ideas and suggestions greatly appreciated as always!

Dominic
ria101.wordpress.com

Re: Database grows 10X bigger after running nodetool repair

Reply via email to