First check your node for IO errors. You have some bad data there. When you restart cassandra it may identify which sstables are corrupt. You can then stop the node and remove them.
You will then need to run repair to replace the missing data. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 7/02/2013, at 1:21 PM, Terry Cumaranatunge <cumar...@gmail.com> wrote: > I may have found a trigger that is causing these problems. Anyone seen these > compaction problems in 1.1? I did run scrub on all my 1.0 data to convert it > to 1.1 and fix level-manifest problems before I started running 1.1. > > 1 node: > ERROR [CompactionExecutor:281] 2013-02-06 23:56:16,183 > AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Comp > actionExecutor:281,1,main] > java.io.IOError: > org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid > column name length 0 > at > org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) > at > org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99) > at > org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) > at > org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83) > at > org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) > at > com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) > at > org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173) > at > org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) > at > org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: > invalid column name length 0 > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:98) > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) > at > org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144) > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234 > ) > at > org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112) > ... 21 more > > 2nd node: > ERROR [CompactionExecutor:266] 2013-02-06 23:51:35,181 > AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Comp > actionExecutor:266,1,main] > java.io.IOError: java.io.EOFException > at > org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) > at > org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99) > at > org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) > at > org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83) > at > org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) > at > com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) > at > org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173) > at > org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) > at > org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: java.io.EOFException > at java.io.RandomAccessFile.readFully(Unknown Source) > at java.io.RandomAccessFile.readFully(Unknown Source) > at > org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95) > at > org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) > at > org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363) > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:120) > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) > at > org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144) > > On Wed, Feb 6, 2013 at 11:32 AM, Terry Cumaranatunge <cumar...@gmail.com> > wrote: > I've gotten timeouts on clients when using Cassandra 1.1.8 in a cluster of 12 > nodes, but I don't see the same behavior when using Cassandra 1.0.10. So, to > do a controlled experiment, the following was tried: > > 1. Started with Cassandra 1.0.10. Built a database and ran our test tools > against it to build a database > 2. Ran workload to ensure no timeout problems were seen. Stopped the load > 3. Upgraded only 2 of the nodes in the cluster to 1.1.8. In the cluster of 12 > nodes. Ran scrub afterwards as document states to convert sstables to 1.1 > format and to fix level-manifest problems. > 4. Started load back up > 5. After some time, started seeing timeouts on the client for requests that > go to the 1.1.8 nodes (i.e. requests sent to those nodes as the coordinator > node) > > There appears to be a pattern in these timeouts in that a large burst of them > occur every 10 minutes (on the 10 minute boundary of the hour, like 10:10:XX, > 10:20:YY, 10:30:ZZ etc.). All clients see the timeouts from those two 1.1.8 > nodes at the same exact time. The workload is not I/O bound at this point and > requests are not being dropped either based on tpstat output. I don't see > hinted handoff messages either as I believe that happens every 10 minutes. > Key cache size is set to 2.7GB and memtable size is 1/3 of heap (2.7GB). The > key cache memory usage is same as 1.0.10 based on heap size calculator. There > are no GC pauses or any type of heap pressure messages in the logs. This is > with Java 1.6.0.38. > > Does anyone know of some periodic tasks in Cassandra 1.1 that happens every > 10 minutes that could explain this problem or have any ideas? > > Thanks > >