Re: Database grows 10X bigger after running nodetool repair

Daniel Doubleday Wed, 25 May 2011 12:22:16 -0700

Firstly any ideas for a quick fix because this is giving me bigproduction problems. Write/read with QUORUM is reportedly producingunpredictable results (people have called support regarding monstersin my MMO appearing and disappearing magically) and many operationsare just failing with SocketTimeoutException I guess because of thecontinuing compactions over huge sstables. I'm going to have to trymaking adjustments to client timeout settings etc but this feels likeusing a hanky to protect oneself from a downpour.

Monsters appear magically? That doesn't sound so bad ...

Sorry ... well this is actually strange. You can get 'inconsistent'results with quorum because of timed out writes that are written on oneserver only but not on the other two (given a rf=3). Some reads willreturn with the old value until one read will eventually get the onenewer 'failed' write. If you're really out of luck read repair can thenfail and you will get the old value again. But this is eventually goingto be ok.

Secondly does anyone know if this is just a waiting game - will thenode eventually correct itself and shrink back down?
I'm down to 204G now from 270G.

One thing you have to make sure is that the repair actually stopped. Ifyou killed the repairing node the other nodes will continue to retry tosend data. I think 7 or 8 times with increasing pauses. Check the logfiles on the other nodes. If your sure that you don't get anymore data Iwould recommend a major (forced) compaction. But this will take a coupleof hours depending on you drives.

If your using dynamic snitch the one slow node should not kill you. Ifit does and you can take it out of the ring. I think there was somenodetool way but I can't remember. Easiest way is to configure it tolisten only on localhost and restart.

Thirdly does anyone know if the problem is contagious i.e. should Iconsider decommissioning the whole node and try to rebuild from replicas?

No. That should not be necessary

Good luck

Thanks, Dominic

On 25 May 2011 17:16, Daniel Doubleday <daniel.double...@gmx.net<mailto:daniel.double...@gmx.net>> wrote:


    We are having problems with repair too.

    It sounds like yours are the same. From today:
    http://permalink.gmane.org/gmane.comp.db.cassandra.user/16619

    On May 25, 2011, at 4:52 PM, Dominic Williams wrote:

    Hi,

    I've got a strange problem, where the database on a node has
    inflated 10X after running repair. This is not the result of
    receiving missed data.

    I didn't perform repair within my usual 10 day cycle, so followed
    recommended practice:
    
http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds

    The sequence of events was like this:

    1) set GCGraceSeconds to some huge value
    2) perform rolling upgrade from 0.7.4 to 0.7.6-2
    3) run nodetool repair on the first node in cluster ~10pm. It has
    a ~30G database
    3) 2.30am decide to leave it running all night and wake up 9am to
    find still running
    4) late morning investigation shows that db size has increased to
    370G. The snapshot folder accounts for only 30G
    5) node starts to run out of disk space http://pastebin.com/Sm0B7nfR
    6) decide to bail! Reset GCGraceSeconds to 864000 and restart
    node to stop repair
    7) as node restarts it deletes a bunch of tmp files, reducing db
    size from 370G to 270G
    8) node now constantly performing minor compactions and du rising
    slightly then falling by a greater amount after minor compaction
    deletes sstable
    9) gradually disk usage is coming down. Currently at 254G (3pm)
    10) performance of node obviously not great!

    Investigation of the database reveals the main problem to have
    occurred in a single column family, UserFights. This contains
    millions of fight records from our MMO, but actually exactly the
    same number as the MonsterFights cf. However, the comparative size is

    Column Family: MonsterFights
    SSTable count: 38
    Space used (live): 13867454647
    Space used (total): 13867454647 (13G)
    Memtable Columns Count: 516
    Memtable Data Size: 598770
    Memtable Switch Count: 4
    Read Count: 514
    Read Latency: 157.649 ms.
    Write Count: 4059
    Write Latency: 0.025 ms.
    Pending Tasks: 0
    Key cache capacity: 200000
    Key cache size: 183004
    Key cache hit rate: 0.0023566218452145135
    Row cache: disabled
    Compacted row minimum size: 771
    Compacted row maximum size: 943127
    Compacted row mean size: 3208

    Column Family: UserFights
    SSTable count: 549
    Space used (live): 185355019679
    Space used (total): 219489031691 (219G)
    Memtable Columns Count: 483
    Memtable Data Size: 560569
    Memtable Switch Count: 8
    Read Count: 2159
    Read Latency: 2589.150 ms.
    Write Count: 4080
    Write Latency: 0.018 ms.
    Pending Tasks: 0
    Key cache capacity: 200000
    Key cache size: 200000
    Key cache hit rate: 0.03357770764288416
    Row cache: disabled
    Compacted row minimum size: 925
    Compacted row maximum size: 12108970
    Compacted row mean size: 503069

    These stats were taken at 3pm, and at 1pm UserFights was using
    224G total, so overall size is gradually coming down.

    Another observation is the following appearing in the logs during
    the minor compactions:
    Compacting large row 536c69636b5061756c (121235810 bytes)
    incrementally

    The largest number of fights any user has performed on our MMO
    that I can find is short of 10,000. Each fight record is smaller
    than 1K... so it looks like these rows have grown +10X somehow.

    The size of UserFights on another replica node, which actually
    has a slightly higher proportion of ring is

    Column Family: UserFights
    SSTable count: 14
    Space used (live): 17844982744
    Space used (total): 17936528583 (18G)
    Memtable Columns Count: 767
    Memtable Data Size: 891153
    Memtable Switch Count: 6
    Read Count: 2298
    Read Latency: 61.020 ms.
    Write Count: 4261
    Write Latency: 0.104 ms.
    Pending Tasks: 0
    Key cache capacity: 200000
    Key cache size: 55172
    Key cache hit rate: 0.8079570484581498
    Row cache: disabled
    Compacted row minimum size: 925
    Compacted row maximum size: 12108970
    Compacted row mean size: 846477
    ...

    All ideas and suggestions greatly appreciated as always!

    Dominic
    ria101.wordpress.com <http://ria101.wordpress.com/>

Re: Database grows 10X bigger after running nodetool repair

Reply via email to