Firstly any ideas for a quick fix because this is giving me big
production problems. Write/read with QUORUM is reportedly producing
unpredictable results (people have called support regarding monsters
in my MMO appearing and disappearing magically) and many operations
are just failing with SocketTimeoutException I guess because of the
continuing compactions over huge sstables. I'm going to have to try
making adjustments to client timeout settings etc but this feels like
using a hanky to protect oneself from a downpour.
Monsters appear magically? That doesn't sound so bad ...
Sorry ... well this is actually strange. You can get 'inconsistent'
results with quorum because of timed out writes that are written on one
server only but not on the other two (given a rf=3). Some reads will
return with the old value until one read will eventually get the one
newer 'failed' write. If you're really out of luck read repair can then
fail and you will get the old value again. But this is eventually going
to be ok.
Secondly does anyone know if this is just a waiting game - will the
node eventually correct itself and shrink back down?
I'm down to 204G now from 270G.
One thing you have to make sure is that the repair actually stopped. If
you killed the repairing node the other nodes will continue to retry to
send data. I think 7 or 8 times with increasing pauses. Check the log
files on the other nodes. If your sure that you don't get anymore data I
would recommend a major (forced) compaction. But this will take a couple
of hours depending on you drives.
If your using dynamic snitch the one slow node should not kill you. If
it does and you can take it out of the ring. I think there was some
nodetool way but I can't remember. Easiest way is to configure it to
listen only on localhost and restart.
Thirdly does anyone know if the problem is contagious i.e. should I
consider decommissioning the whole node and try to rebuild from replicas?
No. That should not be necessary
Good luck
Thanks, Dominic
On 25 May 2011 17:16, Daniel Doubleday <daniel.double...@gmx.net
<mailto:daniel.double...@gmx.net>> wrote:
We are having problems with repair too.
It sounds like yours are the same. From today:
http://permalink.gmane.org/gmane.comp.db.cassandra.user/16619
On May 25, 2011, at 4:52 PM, Dominic Williams wrote:
Hi,
I've got a strange problem, where the database on a node has
inflated 10X after running repair. This is not the result of
receiving missed data.
I didn't perform repair within my usual 10 day cycle, so followed
recommended practice:
http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
The sequence of events was like this:
1) set GCGraceSeconds to some huge value
2) perform rolling upgrade from 0.7.4 to 0.7.6-2
3) run nodetool repair on the first node in cluster ~10pm. It has
a ~30G database
3) 2.30am decide to leave it running all night and wake up 9am to
find still running
4) late morning investigation shows that db size has increased to
370G. The snapshot folder accounts for only 30G
5) node starts to run out of disk space http://pastebin.com/Sm0B7nfR
6) decide to bail! Reset GCGraceSeconds to 864000 and restart
node to stop repair
7) as node restarts it deletes a bunch of tmp files, reducing db
size from 370G to 270G
8) node now constantly performing minor compactions and du rising
slightly then falling by a greater amount after minor compaction
deletes sstable
9) gradually disk usage is coming down. Currently at 254G (3pm)
10) performance of node obviously not great!
Investigation of the database reveals the main problem to have
occurred in a single column family, UserFights. This contains
millions of fight records from our MMO, but actually exactly the
same number as the MonsterFights cf. However, the comparative size is
Column Family: MonsterFights
SSTable count: 38
Space used (live): 13867454647
Space used (total): 13867454647 (13G)
Memtable Columns Count: 516
Memtable Data Size: 598770
Memtable Switch Count: 4
Read Count: 514
Read Latency: 157.649 ms.
Write Count: 4059
Write Latency: 0.025 ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 183004
Key cache hit rate: 0.0023566218452145135
Row cache: disabled
Compacted row minimum size: 771
Compacted row maximum size: 943127
Compacted row mean size: 3208
Column Family: UserFights
SSTable count: 549
Space used (live): 185355019679
Space used (total): 219489031691 (219G)
Memtable Columns Count: 483
Memtable Data Size: 560569
Memtable Switch Count: 8
Read Count: 2159
Read Latency: 2589.150 ms.
Write Count: 4080
Write Latency: 0.018 ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 200000
Key cache hit rate: 0.03357770764288416
Row cache: disabled
Compacted row minimum size: 925
Compacted row maximum size: 12108970
Compacted row mean size: 503069
These stats were taken at 3pm, and at 1pm UserFights was using
224G total, so overall size is gradually coming down.
Another observation is the following appearing in the logs during
the minor compactions:
Compacting large row 536c69636b5061756c (121235810 bytes)
incrementally
The largest number of fights any user has performed on our MMO
that I can find is short of 10,000. Each fight record is smaller
than 1K... so it looks like these rows have grown +10X somehow.
The size of UserFights on another replica node, which actually
has a slightly higher proportion of ring is
Column Family: UserFights
SSTable count: 14
Space used (live): 17844982744
Space used (total): 17936528583 (18G)
Memtable Columns Count: 767
Memtable Data Size: 891153
Memtable Switch Count: 6
Read Count: 2298
Read Latency: 61.020 ms.
Write Count: 4261
Write Latency: 0.104 ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 55172
Key cache hit rate: 0.8079570484581498
Row cache: disabled
Compacted row minimum size: 925
Compacted row maximum size: 12108970
Compacted row mean size: 846477
...
All ideas and suggestions greatly appreciated as always!
Dominic
ria101.wordpress.com <http://ria101.wordpress.com/>