Re: Repair failure under 0.8.6

Maxim Potekhin Mon, 05 Dec 2011 11:44:45 -0800

Basically I tweaked the phi, put in more verbose GC reporting anddecided to do a compaction before I proceed. I'm getting this on thenode where compaction is being run. And the system log for the other twonodes follows. It's obvious that the cluster is sick, but Ican't determine why -- there are no overwhelming GC evidence as far as Ican see. I didn't start compaction on node #3, somehow

it attempts to do it anyhow.


===============================================
Node #2 (compaction is being run):

INFO [CompactionExecutor:2] 2011-12-05 14:19:36,741CompactionManager.java (line 608) Compacted to/data/cassandra_data/data/system/LocationInfo-tmp-g-72-Data.db. 967 to561 (~58% of

original) bytes for 4 keys.  Time: 71ms.

INFO [main] 2011-12-05 14:19:36,941 Mx4jTool.java (line 67) mx4jsuccessfuly loadedINFO [GossipStage:1] 2011-12-05 14:19:36,943 Gossiper.java (line 715)Node /130.199.185.193 has restarted, now UP againINFO [GossipStage:1] 2011-12-05 14:19:36,943 Gossiper.java (line 683)InetAddress /130.199.185.193 is now UPINFO [GossipStage:1] 2011-12-05 14:19:36,971 StorageService.java (line819) Node /130.199.185.193 state jump to normalINFO [GossipStage:1] 2011-12-05 14:19:36,971 Gossiper.java (line 715)Node /130.199.185.195 has restarted, now UP againINFO [GossipStage:1] 2011-12-05 14:19:36,971 Gossiper.java (line 683)InetAddress /130.199.185.195 is now UPINFO [GossipStage:1] 2011-12-05 14:19:36,974 StorageService.java (line819) Node /130.199.185.195 state jump to normalINFO [main] 2011-12-05 14:19:37,003 CassandraDaemon.java (line 115)Binding thrift service to cassandra02.usatlas.bnl.gov/130.199.185.194:9160INFO [main] 2011-12-05 14:19:37,016 CassandraDaemon.java (line 124)Using TFastFramedTransport with a max frame size of 15728640 bytes.INFO [main] 2011-12-05 14:19:37,018 CassandraDaemon.java (line 151)Using synchronous/threadpool thrift server oncassandra02.usatlas.bnl.gov/130.199.185.194 : 9160INFO [Thread-6] 2011-12-05 14:19:37,019 CassandraDaemon.java (line203) Listening for thrift clients...INFO [GossipTasks:1] 2011-12-05 14:19:50,601 Gossiper.java (line 697)InetAddress /130.199.185.195 is now dead.ERROR [HintedHandoff:1] 2011-12-05 14:20:37,954AbstractCassandraDaemon.java (line 139) Fatal exception in threadThread[HintedHandoff:1,1,main]java.lang.RuntimeException: java.lang.RuntimeException: Could not reachschema agreement with /130.199.185.193 in 60000msatorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

Caused by: java.lang.RuntimeException: Could not reach schema agreementwith /130.199.185.193 in 60000msatorg.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:293)atorg.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:304)atorg.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:89)atorg.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:397)atorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)

        ... 3 more

ERROR [HintedHandoff:1] 2011-12-05 14:20:37,956AbstractCassandraDaemon.java (line 139) Fatal exception in threadThread[HintedHandoff:1,1,main]java.lang.RuntimeException: java.lang.RuntimeException: Could not reachschema agreement with /130.199.185.193 in 60000msatorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)


=====================================================================
Node #1 (nothing run)

INFO [main] 2011-12-05 14:16:15,779 CassandraDaemon.java (line 115)Binding thrift service to cassandra01.usatlas.bnl.gov/130.199.185.193:9160INFO [main] 2011-12-05 14:16:15,782 CassandraDaemon.java (line 124)Using TFastFramedTransport with a max frame size of 15728640 bytes.INFO [main] 2011-12-05 14:16:15,784 CassandraDaemon.java (line 151)Using synchronous/threadpool thrift server oncassandra01.usatlas.bnl.gov/130.199.185.193 : 9160INFO [Thread-5] 2011-12-05 14:16:15,785 CassandraDaemon.java (line203) Listening for thrift clients...INFO [HintedHandoff:1] 2011-12-05 14:16:50,578HintedHandOffManager.java (line 323) Started hinted handoff for endpoint/130.199.185.195INFO [ScheduledTasks:1] 2011-12-05 14:16:53,220 GCInspector.java (line122) GC for ParNew: 284 ms for 1 collections, 1434456744 used; max is8178892800INFO [HintedHandoff:1] 2011-12-05 14:16:56,166HintedHandOffManager.java (line 379) Finished hinted handoff of 0 rowsto endpoint /130.199.185.195INFO [GossipTasks:1] 2011-12-05 14:17:19,310 Gossiper.java (line 697)InetAddress /130.199.185.194 is now dead.INFO [GossipTasks:1] 2011-12-05 14:17:50,326 Gossiper.java (line 697)InetAddress /130.199.185.195 is now dead.ERROR [HintedHandoff:1] 2011-12-05 14:17:57,173AbstractCassandraDaemon.java (line 139) Fatal exception in threadThread[HintedHandoff:1,1,main]java.lang.RuntimeException: java.lang.RuntimeException: Could not reachschema agreement with /130.199.185.194 in 60000msatorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

Caused by: java.lang.RuntimeException: Could not reach schema agreementwith /130.199.185.194 in 60000msatorg.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:293)atorg.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:304)atorg.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:89)atorg.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:397)atorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)

        ... 3 more

ERROR [HintedHandoff:1] 2011-12-05 14:17:57,175AbstractCassandraDaemon.java (line 139) Fatal exception in threadThread[HintedHandoff:1,1,main]java.lang.RuntimeException: java.lang.RuntimeException: Could not reachschema agreement with /130.199.185.194 in 60000msatorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

        ... 3 more

INFO [GossipStage:1] 2011-12-05 14:19:36,938 Gossiper.java (line 715)Node /130.199.185.194 has restarted, now UP againINFO [GossipStage:1] 2011-12-05 14:19:36,938 Gossiper.java (line 683)InetAddress /130.199.185.194 is now UPINFO [GossipStage:1] 2011-12-05 14:19:36,939 StorageService.java (line819) Node /130.199.185.194 state jump to normalERROR [HintedHandoff:2] 2011-12-05 14:20:37,944AbstractCassandraDaemon.java (line 139) Fatal exception in threadThread[HintedHandoff:2,1,main]java.lang.RuntimeException: java.lang.RuntimeException: Could not reachschema agreement with /130.199.185.194 in 60000ms


========================================================================================

Node #3 (nothing run):

INFO [GossipStage:1] 2011-12-05 14:23:17,762 StorageService.java (line819) Node /130.199.185.193 state jump to normalINFO [GossipStage:1] 2011-12-05 14:23:17,762 Gossiper.java (line 715)Node /130.199.185.194 has restarted, now UP againINFO [GossipStage:1] 2011-12-05 14:23:17,763 Gossiper.java (line 683)InetAddress /130.199.185.194 is now UPINFO [GossipStage:1] 2011-12-05 14:23:18,045 StorageService.java (line819) Node /130.199.185.194 state jump to normalINFO [CompactionExecutor:2] 2011-12-05 14:23:18,055CompactionManager.java (line 608) Compacted to/data/cassandra_data/data/system/LocationInfo-tmp-g-32-Data.db. 967 to561 (~58% of origina

l) bytes for 4 keys.  Time: 323ms.

INFO [main] 2011-12-05 14:23:18,074 CassandraDaemon.java (line 115)Binding thrift service to cassandra03.usatlas.bnl.gov/130.199.185.195:9160INFO [main] 2011-12-05 14:23:18,077 CassandraDaemon.java (line 124)Using TFastFramedTransport with a max frame size of 15728640 bytes.INFO [main] 2011-12-05 14:23:18,080 CassandraDaemon.java (line 151)Using synchronous/threadpool thrift server oncassandra03.usatlas.bnl.gov/130.199.185.195 : 9160INFO [Thread-6] 2011-12-05 14:23:18,080 CassandraDaemon.java (line203) Listening for thrift clients...INFO [HintedHandoff:1] 2011-12-05 14:23:24,535HintedHandOffManager.java (line 323) Started hinted handoff for endpoint/130.199.185.193INFO [HintedHandoff:1] 2011-12-05 14:23:24,537HintedHandOffManager.java (line 379) Finished hinted handoff of 0 rowsto endpoint /130.199.185.193ERROR [HintedHandoff:1] 2011-12-05 14:24:25,553AbstractCassandraDaemon.java (line 139) Fatal exception in threadThread[HintedHandoff:1,1,main]java.lang.RuntimeException: java.lang.RuntimeException: Could not reachschema agreement with /130.199.185.194 in 60000msatorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

        ... 3 more

ERROR [HintedHandoff:1] 2011-12-05 14:24:25,555AbstractCassandraDaemon.java (line 139) Fatal exception in threadThread[HintedHandoff:1,1,main]java.lang.RuntimeException: java.lang.RuntimeException: Could not reachschema agreement with /130.199.185.194 in 60000msatorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

        ... 3 more

INFO [CompactionExecutor:5] 2011-12-05 14:27:11,220CompactionManager.java (line 494) Nothing to compact in IndexInfo; useforceUserDefinedCompaction if you wish to force compaction of single

 sstables (e.g. for tombstone collection)

INFO [CompactionExecutor:6] 2011-12-05 14:27:11,221CompactionManager.java (line 494) Nothing to compact in LocationInfo;use forceUserDefinedCompaction if you wish to force compaction of sin

gle sstables (e.g. for tombstone collection)

INFO [CompactionExecutor:7] 2011-12-05 14:27:11,222CompactionManager.java (line 542) Compacting Major:[SSTableReader(path='/data/cassandra_data/data/system/Migrations-g-72-Data.db'),SSTabl

eReader(path='/data/cassandra_data/data/system/Migrations-g-71-Data.db')]

INFO [CompactionExecutor:7] 2011-12-05 14:27:11,258CompactionManager.java (line 608) Compacted to/data/cassandra_data/data/system/Migrations-tmp-g-73-Data.db. 398,726to 398,662 (~99% of o

riginal) bytes for 1 keys.  Time: 36ms.

INFO [CompactionExecutor:10] 2011-12-05 14:27:11,260CompactionManager.java (line 542) Compacting Major:[SSTableReader(path='/data/cassandra_data/data/system/Schema-g-72-Data.db'),SSTableRe

ader(path='/data/cassandra_data/data/system/Schema-g-71-Data.db')]

INFO [CompactionExecutor:10] 2011-12-05 14:27:11,291CompactionManager.java (line 608) Compacted to/data/cassandra_data/data/system/Schema-tmp-g-73-Data.db. 175,627 to175,518 (~99% of orig

inal) bytes for 54 keys.  Time: 30ms.

INFO [CompactionExecutor:12] 2011-12-05 14:27:11,293CompactionManager.java (line 494) Nothing to compact in indexRegistry;use forceUserDefinedCompaction if you wish to force compaction of s

ingle sstables (e.g. for tombstone collection)

INFO [CompactionExecutor:13] 2011-12-05 14:27:11,295StorageService.java (line 2228) requesting GC to free disk spaceWARN [CompactionExecutor:13] 2011-12-05 14:27:34,920CompactionManager.java (line 509) insufficient space to compact allrequested files SSTableReader(path='/data/cassandra_data/data/PANDA/jobs-g-54-Data.db'),SSTableReader(path='/data/cassandra_data/data/PANDA/jobs-g-55-Data.db')ERROR [CompactionExecutor:13] 2011-12-05 14:27:34,920CompactionManager.java (line 513) insufficient space to compact even thetwo smallest files, aborting



On 12/4/2011 7:17 PM, Peter Schuller wrote:

As a side effect of the failed repair (so it seems) the disk usage on the
affected node prevents compaction from working. It still works on
the remaining nodes (we have 3 total).
Is there a way to scrub the extraneous data?

This is one of the reasons why killing an in-process repair is a bad thing :(

If you do not have enough disk space for any kind of compaction to
work, then no, unfortunately there is no easy way to get rid of the
data.

You can go to extra trouble such as moving the entire node to some
other machine (e.g. firewalled from the cluster) with more disk and
run compaction there and then "move it back", but that is kind of
painful to do. Another option is to decommission the node and replace
it. However, be aware that (1) that leaves the ring with less capacity
for a while, and (2) when you decommission, the data you stream from
that node to others would be artificially inflated due to the repair
so there is some risk of "infecting" the other nodes with a large data
set.

I should mention that if you have no traffic running against the
cluster, one way is to just remove all the data and then run repair
afterwards. But that implies that you're trusting that (1) no reads
are going to the cluster (else you might serve reads based on missing
data) and (2) that you are comfortable with loss of the data on the
node. (2) might be okay if you're e.g. writing at QUORUM at all times
and have RF>= 3 (basically, this is as if the node would have been
lost due to hardware breakage).

A faster way to reconstruct the node would be to delete the data from
your keyspaces (except the system keyspace), start the node (now
missing data), and run 'nodetool rebuild' after
https://issues.apache.org/jira/browse/CASSANDRA-3483 is done. The
patch attached to that ticket should work for 0.8.6 I suspect (but no
guarantees). This also assumes you have no reads running against the
cluster.

Re: Repair failure under 0.8.6

Reply via email to