Re: Urgent - IllegalArgumentException during compaction and memtable flush

Piavlo Thu, 14 Jun 2012 02:59:53 -0700


 Hi Sylvain,

Yes this UserCompletions CF uses composite comparator and I do usesstable compression.


What's the procedure to check if the compressed sstable is corrupted or not?

If it's corrupted what can I do to fix the issue with minimal clusterload impact?Is there way to delete all UserCompletions sstables on the problematicnode and then run repair on this CF only?Like disable thrift, drain memtables so it does not read commit log onstartup and then delete the sstables and start the node again will it work?

BUT since I saw this error in 3 nodes (and RF=3 too) inValidationExecutor at almost the same time (at 3 different times -Probably due to 3 attempts of reruning "repair -pr UserCompletionsdsc2b.internal" which never returned from the blocked nodeool command -an each time repair finished the new sstables trigger compations on allinvoled nodes)can it mean that sstable is not corrupted but just some BAD column namewas inserted OK but can not be read later read by ValidationExecutor inany of the replica nodes?


Check the relevant cassandra logs below

dsc2b.internal/10.234.71.33
-----------------------

INFO [AntiEntropySessions:66] 2012-06-13 18:49:24,464AntiEntropyService.java (line 658) [repair#7ec142c0-b588-11e1-0000-f423231d3fff] new session: will syncdsc2b.internal/10.234.71.33, /10.49.127.4, /10.58.249.118 on range(85070591730234615865843651857942052864,113427455640312821154458202477256070485]for PRODUCTION.[UserCompletions]INFO [AntiEntropySessions:66] 2012-06-13 18:49:24,465AntiEntropyService.java (line 837) [repair#7ec142c0-b588-11e1-0000-f423231d3fff] requests for merkle tree sent forUserCompletions (to [/10.49.127.4, /10.58.249.118,dsc2b.internal/10.234.71.33])INFO [ValidationExecutor:129] 2012-06-13 18:49:24,466ColumnFamilyStore.java (line 705) Enqueuing flush ofMemtable-UserCompletions@843906517(9952311/21343163 serialized/livebytes, 41801 ops)INFO [FlushWriter:2563] 2012-06-13 18:49:24,467 Memtable.java (line246) Writing Memtable-UserCompletions@843906517(9952311/21343163serialized/live bytes, 41801 ops)INFO [FlushWriter:2563] 2012-06-13 18:49:24,828 Memtable.java (line283) Completed flushing/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-515-Data.db(1671566 bytes)ERROR [ValidationExecutor:129] 2012-06-13 18:55:32,236AbstractCassandraDaemon.java (line 139) Fatal exception in threadThread[ValidationExecutor:129,1,main]

java.lang.IllegalArgumentException
        at java.nio.Buffer.limit(Buffer.java:249)
....
-----------------------

dsc1a.internal/10.49.127.4
-----------------------

INFO [ValidationExecutor:125] 2012-06-13 18:49:24,457ColumnFamilyStore.java (line 705) Enqueuing flush ofMemtable-UserCompletions@266077104(9047552/76151840 serialized/livebytes, 38000 ops)INFO [FlushWriter:2670] 2012-06-13 18:49:24,466 Memtable.java (line246) Writing Memtable-UserCompletions@266077104(9047552/76151840serialized/live bytes, 38000 ops)INFO [FlushWriter:2670] 2012-06-13 18:49:24,969 Memtable.java (line283) Completed flushing/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-1030-Data.db(1508368 bytes)INFO [CompactionExecutor:3299] 2012-06-13 18:49:24,971CompactionTask.java (line 115) Compacting[SSTableReader(path='/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-1027-Data.db'),SSTableReader(path='/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-1030-Data.db'),SSTableReader(path='/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-1028-Data.db'),SSTableReader(path='/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-1029-Data.db')]INFO [CompactionExecutor:3299] 2012-06-13 18:50:03,554CompactionTask.java (line 223) Compacted to[/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-1031-Data.db,].23,417,251 to 23,832,802 (~101% of original) bytes for 116,956 keys at0.589102MB/s. Time: 38,582ms.ERROR [ValidationExecutor:125] 2012-06-13 18:56:58,961AbstractCassandraDaemon.java (line 139) Fatal exception in threadThread[ValidationExecutor:125,1,main]

java.lang.IllegalArgumentException
        at java.nio.Buffer.limit(Buffer.java:249)
...
-------------------------

dsc2c.internal/10.58.249.118
-------------------------

INFO [ValidationExecutor:119] 2012-06-13 18:49:24,305ColumnFamilyStore.java (line 705) Enqueuing flush ofMemtable-UserCompletions@1279460811(19014066/66201229 serialized/livebytes, 79838 ops)INFO [FlushWriter:2001] 2012-06-13 18:49:24,326 Memtable.java (line246) Writing Memtable-UserCompletions@1279460811(19014066/66201229serialized/live bytes, 79838 ops)INFO [FlushWriter:2001] 2012-06-13 18:49:24,848 Memtable.java (line283) Completed flushing/var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-548-Data.db(3177074 bytes)ERROR [ValidationExecutor:119] 2012-06-13 18:55:50,387AbstractCassandraDaemon.java (line 139) Fatal exception in threadThread[ValidationExecutor:119,1,main]

java.lang.IllegalArgumentException
        at java.nio.Buffer.limit(Buffer.java:249)
...
-------------------------

Thanks for your help.


On 06/14/2012 11:09 AM, Sylvain Lebresne wrote:

On Thu, Jun 14, 2012 at 8:26 AM, Piavlo<lolitus...@gmail.com>  wrote:

I started looking for similar messages on other nodes saw a SINGLE 
IllegalArgumentException on
ValidationExecutor on the same node and 2 other nodes (this is a 6 node 
cluster) which happened
at almost the same time , in all nodes while flushing same UserCompletions CF 
memtable. This
happened about 12hours before the IllegalArgumentException in  
CompactionExecutor.

This actually does not happen during a flush but during a validation
compaction, which happens during a repair.
The exception is basically saying there is invalid composite column
name (you do use a composite comparator right?).
I guess that could result from some on-disk corruption. Are you using
sstable compression on UserCompletions? (I am asking because
compressed sstables have checksums)

And even bigger problem now is that running repairs on other CFs against
different nodes does not have any effect, for example running
/usr/bin/nodetool -h dsc2b.internal -pr repair PRODUCTION UserDirectVendors
does not trigger any repair activity and nothing in the logs to indicate a
start of repair. And I have ~24hours left to repair some CFs before the gc
period ends :(

Does that happen on every node?
What can happen is that some failed repair may block other from
starting. One thing you can try is to run the method called
forceTerminateAllRepairessions in JMX under
org.apache.cassandra.db->StorageService->Operations (I'm afraid there
is no nodetool hook so you will have to use jconsole). After that, try
starting a repair again. If that doesn't work, it's worth trying to
restart the node.

--
Sylvain

Re: Urgent - IllegalArgumentException during compaction and memtable flush

Reply via email to