3. How do we rebuild System keyspace? wipe this node and start it all over.
hth jason On Tue, Jul 7, 2015 at 12:16 AM, Shashi Yachavaram <shashi...@gmail.com> wrote: > When we reboot the problematic node, we see the following errors in > system.log. > > 1. Does this mean hints column family is corrupted? > 2. Can we scrub system column family on problematic node and its > replication partners? > 3. How do we rebuild System keyspace? > > ================================================================== > ERROR [CompactionExecutor:950] 2015-06-27 20:11:44,595 > CassandraDaemon.java (line 191) Exception in thread > Thread[CompactionExecutor:950,1,main] > java.lang.AssertionError: originally calculated column size of 8684 but > now it is 15725 > at > org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160) > at > org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) > at > org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) > at > org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > ERROR [HintedHandoff:552] 2015-06-27 20:11:44,595 CassandraDaemon.java > (line 191) Exception in thread Thread[HintedHandoff:552,1,main] > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.AssertionError: originally calculated column size of 8684 but now > it is 15725 > at > org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:436) > at > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282) > at > org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90) > at > org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:502) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: java.util.concurrent.ExecutionException: > java.lang.AssertionError: originally calculated column size of 8684 but now > it is 15725 > at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source) > at java.util.concurrent.FutureTask.get(Unknown Source) > at > org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:432) > ... 6 more > Caused by: java.lang.AssertionError: originally calculated column size of > 8684 but now it is 15725 > at > org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160) > at > org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162) > at > org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) > at > org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > ================================================================== > > > On Wed, Jul 1, 2015 at 11:59 AM, Shashi Yachavaram <shashi...@gmail.com> > wrote: > >> We have a 28 node cluster, out of which only one node is experiencing >> timeouts. >> We thought it was the raid, but there are two other nodes on the same >> raid without >> any problem. Also The problem goes away if we reboot the node, and then >> reappears >> after seven days. The following hinted hand-off timeouts are seen on the >> node >> experiencing the timeouts. Also we did not notice any gossip errors. >> >> I was wondering if anyone has seen this issue and how they resolved it. >> >> Cassandra Version: 1.2.15.1 >> OS: Linux cm 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST >> 2014 x86_64 x86_64 x86_64 GNU/Linux >> java version "1.6.0_85" >> >> >> ------------------------------------------------------------------------------------------------------------------------------------ >> INFO [HintedHandoff:2] 2015-06-17 22:52:08,130 HintedHandOffManager.java >> (line 296) Started hinted handoff for host: >> 4fe86051-6bca-4c28-b09c-1b0f073c1588 with IP: /192.168.1.122 >> INFO [HintedHandoff:1] 2015-06-17 22:52:08,131 HintedHandOffManager.java >> (line 296) Started hinted handoff for host: >> bbf0878b-b405-4518-b649-f6cf7c9a6550 with IP: /192.168.1.119 >> INFO [HintedHandoff:2] 2015-06-17 22:52:17,634 HintedHandOffManager.java >> (line 422) Timed out replaying hints to /192.168.1.122; aborting (0 >> delivered) >> INFO [HintedHandoff:2] 2015-06-17 22:52:17,635 HintedHandOffManager.java >> (line 296) Started hinted handoff for host: >> f7b7ab10-4d42-4f0c-af92-2934a075bee3 with IP: /192.168.1.108 >> INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java >> (line 422) Timed out replaying hints to /192.168.1.119; aborting (0 >> delivered) >> INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java >> (line 296) Started hinted handoff for host: >> ddb79f35-3e2b-4be8-84d8-7942086e2b73 with IP: /192.168.1.104 >> INFO [HintedHandoff:2] 2015-06-17 22:52:27,143 HintedHandOffManager.java >> (line 422) Timed out replaying hints to /192.168.1.108; aborting (0 >> delivered) >> INFO [HintedHandoff:2] 2015-06-17 22:52:27,144 HintedHandOffManager.java >> (line 296) Started hinted handoff for host: >> 6a2fa431-4a51-44cb-af19-1991c960e075 with IP: /192.168.1.117 >> INFO [HintedHandoff:1] 2015-06-17 22:52:27,153 HintedHandOffManager.java >> (line 422) Timed out replaying hints to /192.168.1.104; aborting (0 >> delivered) >> INFO [HintedHandoff:1] 2015-06-17 22:52:27,154 HintedHandOffManager.java >> (line 296) Started hinted handoff for host: >> cf03174a-533c-44d6-a679-e70090ad2bc5 with IP: /192.168.1.107 >> >> ------------------------------------------------------------------------------------------------------------------------------------ >> >> Thanks >> -shashi.. >> > >