Re: Database grows 10X bigger after running nodetool repair

Dominic Williams Wed, 25 May 2011 12:14:33 -0700

Links for issue causing this:
http://issues.apache.org/jira/browse/CASSANDRA-2670
http://issues.apache.org/jira/browse/CASSANDRA-2280


For anyone in this boat, my advice is:-

1. Do a rolling restart immediately, starting with the node you were running
repair on. If you don't do this, the other nodes will keep on streaming data
to the node being repaired until it goes pop (you run out of disk space!)
When you restart your problem node it will also trigger it to delete a bunch
of tmp files which will help too.

2. Get ready for a long wait because the minor compactions now have to clean
out the data that has been dumped on your node. It does seem that the
compactions will progressively get through the data - we are down to 204G
from 270G 5/6 hours ago!

3. Make preparations for disruptions to your system. At least in our case we
are getting a lot of timeouts on operations that are affecting users etc.

Hope this helps

On 25 May 2011 19:07, Dominic Williams <thedwilli...@gmail.com> wrote:

> Jeepers creepers that's it Jeeves!!! Arrrrrghhhhh.
>
> Basically once my repair hit a big column family db size exploded until the
> node ran out of disk space..
>
> Firstly any ideas for a quick fix because this is giving me big production
> problems. Write/read with QUORUM is reportedly producing unpredictable
> results (people have called support regarding monsters in my MMO appearing
> and disappearing magically) and many operations are just failing with
> SocketTimeoutException I guess because of the continuing compactions over
> huge sstables. I'm going to have to try making adjustments to client timeout
> settings etc but this feels like using a hanky to protect oneself from a
> downpour.
>
> Secondly does anyone know if this is just a waiting game - will the node
> eventually correct itself and shrink back down?
> I'm down to 204G now from 270G.
>
> Thirdly does anyone know if the problem is contagious i.e. should I
> consider decommissioning the whole node and try to rebuild from replicas?
>
> Thanks, Dominic
>
> On 25 May 2011 17:16, Daniel Doubleday <daniel.double...@gmx.net> wrote:
>
>> We are having problems with repair too.
>>
>> It sounds like yours are the same. From today:
>> http://permalink.gmane.org/gmane.comp.db.cassandra.user/16619
>> <http://permalink.gmane.org/gmane.comp.db.cassandra.user/16619>
>> On May 25, 2011, at 4:52 PM, Dominic Williams wrote:
>>
>> Hi,
>>
>> I've got a strange problem, where the database on a node has inflated 10X
>> after running repair. This is not the result of receiving missed data.
>>
>> I didn't perform repair within my usual 10 day cycle, so followed
>> recommended practice:
>>
>> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
>>
>> The sequence of events was like this:
>>
>> 1) set GCGraceSeconds to some huge value
>> 2) perform rolling upgrade from 0.7.4 to 0.7.6-2
>> 3) run nodetool repair on the first node in cluster ~10pm. It has a ~30G
>> database
>>  3) 2.30am decide to leave it running all night and wake up 9am to find
>> still running
>> 4) late morning investigation shows that db size has increased to 370G.
>> The snapshot folder accounts for only 30G
>> 5) node starts to run out of disk space http://pastebin.com/Sm0B7nfR
>> 6) decide to bail! Reset GCGraceSeconds to 864000 and restart node to stop
>> repair
>> 7) as node restarts it deletes a bunch of tmp files, reducing db size from
>> 370G to 270G
>> 8) node now constantly performing minor compactions and du rising slightly
>> then falling by a greater amount after minor compaction deletes sstable
>> 9) gradually disk usage is coming down. Currently at 254G (3pm)
>> 10) performance of node obviously not great!
>>
>> Investigation of the database reveals the main problem to have occurred in
>> a single column family, UserFights. This contains millions of fight records
>> from our MMO, but actually exactly the same number as the MonsterFights cf.
>> However, the comparative size is
>>
>> Column Family: MonsterFights
>>  SSTable count: 38
>> Space used (live): 13867454647
>>  Space used (total): 13867454647 (13G)
>> Memtable Columns Count: 516
>> Memtable Data Size: 598770
>>  Memtable Switch Count: 4
>> Read Count: 514
>> Read Latency: 157.649 ms.
>>  Write Count: 4059
>> Write Latency: 0.025 ms.
>> Pending Tasks: 0
>>  Key cache capacity: 200000
>> Key cache size: 183004
>> Key cache hit rate: 0.0023566218452145135
>>  Row cache: disabled
>> Compacted row minimum size: 771
>> Compacted row maximum size: 943127
>>  Compacted row mean size: 3208
>>
>>  Column Family: UserFights
>> SSTable count: 549
>>  Space used (live): 185355019679
>> Space used (total): 219489031691 (219G)
>> Memtable Columns Count: 483
>>  Memtable Data Size: 560569
>> Memtable Switch Count: 8
>> Read Count: 2159
>>  Read Latency: 2589.150 ms.
>> Write Count: 4080
>> Write Latency: 0.018 ms.
>>  Pending Tasks: 0
>> Key cache capacity: 200000
>> Key cache size: 200000
>>  Key cache hit rate: 0.03357770764288416
>> Row cache: disabled
>> Compacted row minimum size: 925
>>  Compacted row maximum size: 12108970
>> Compacted row mean size: 503069
>>
>> These stats were taken at 3pm, and at 1pm UserFights was using 224G total,
>> so overall size is gradually coming down.
>>
>> Another observation is the following appearing in the logs during the
>> minor compactions:
>> Compacting large row 536c69636b5061756c (121235810 bytes) incrementally
>>
>> The largest number of fights any user has performed on our MMO that I can
>> find is short of 10,000. Each fight record is smaller than 1K... so it looks
>> like these rows have grown +10X somehow.
>>
>> The size of UserFights on another replica node, which actually has a
>> slightly higher proportion of ring is
>>
>> Column Family: UserFights
>>  SSTable count: 14
>> Space used (live): 17844982744
>> Space used (total): 17936528583 (18G)
>>  Memtable Columns Count: 767
>> Memtable Data Size: 891153
>> Memtable Switch Count: 6
>>  Read Count: 2298
>> Read Latency: 61.020 ms.
>> Write Count: 4261
>>  Write Latency: 0.104 ms.
>> Pending Tasks: 0
>> Key cache capacity: 200000
>>  Key cache size: 55172
>> Key cache hit rate: 0.8079570484581498
>> Row cache: disabled
>>  Compacted row minimum size: 925
>> Compacted row maximum size: 12108970
>> Compacted row mean size: 846477
>> ...
>>
>> All ideas and suggestions greatly appreciated as always!
>>
>> Dominic
>> ria101.wordpress.com
>>
>>
>>
>

Re: Database grows 10X bigger after running nodetool repair

Reply via email to