Re: nodetool repair caused high disk space usage
Péter, In our case they get created exclusively during repairs. Compactionstats showed a huge number of sstable build compactions On Aug 20, 2011 1:23 AM, "Peter Schuller" wrote: >> Is there any chance that the entire file from source node got streamed to >> destination node even though only small amount of data in hte file from >> source node is supposed to be streamed destination node? > > Yes, but the thing that's annoying me is that even if so - you should > not be seeing a 40 gb -> hundreds of gig increase even if all > neighbors sent all their data. > > Can you check system.log for references to these sstables to see when > and under what circumstances they got written? > > -- > / Peter Schuller (@scode on twitter)
Re: node restart taking too long
any suggestion? thanks! On Fri, Aug 19, 2011 at 10:26 PM, Yan Chunlu wrote: > the log file shows as follows, not sure what does 'Couldn't find cfId=1000' > means(google just returned useless results): > > > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453) > Found table data in data directories. Consider using JMX to call > org.apache.cassandra.service.StorageService.loadSchemaFromYaml(). > INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50) > Creating new commitlog segment > /cassandra/commitlog/CommitLog-1313670197705.log > INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying > /cassandra/commitlog/CommitLog-1313670030512.log > INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished > reading /cassandra/commitlog/CommitLog-1313670030512.log > INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log replay > complete > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364) > Cassandra version: 0.7.4 > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365) Thrift > API version: 19.4.0 > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378) Loading > persisted ring state > INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414) > Starting up server gossip > INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048) > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1 operations) > INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157) > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations) > INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164) > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80 > bytes) > INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823 CompactionManager.java > (line 396) Compacting > [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')] > INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) Using > saved token 113427455640312821154458202477256070484 > INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048) > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2 operations) > INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157) > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations) > ERROR [MutationStage:28] 2011-08-18 07:23:18,246 > RowMutationVerbHandler.java (line 86) Error in row mutation > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find > cfId=1000 > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117) > at > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623) Node > /node1 has restarted, now UP again > ERROR [ReadStage:1] 2011-08-18 07:23:18,254 > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in > keyspace prjkeyspace > at > org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966) > at > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388) > at > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93) > at > org.apache.cassandra.db.SliceByNamesReadCommand.(SliceByNamesReadCommand.java:44) > at > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110) > at > org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122) > at > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67) > > > > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton wrote: > >> Look in the logs to work find out why the migration did not get to node2. >> >> Otherwise yes you can drop those files. >> >> Cheers >> >> - >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote: >> >> just found out that changes via cassandra-cli, the schema change didn't >> reach node2. and node2 became unreachable >> >> I did as this document: >> http://wiki.apache.org/cassandra/FAQ#schema_disagreement >> >> but after that I just got two
Re: 0.7.4: Replication assertion error after removetoken, removetoken force and a restart
0.7.4/ 3 node cluster/ RF -3 /Quorum read/write After I re-introduced a corrupted node, followed the process as (thanks to folks on the mailing list for helping me) listed on the operations wiki to handle failures. Still doing a cleanup on one node at this point. But I noticed that I am seeing this same exception appear 10/12 times in a minute, on an existing node (not the new one). I think it started around the removetoken. How do I solve this, should I just restart this node? Any other cleanups/resets I need to do? Thanks On Thu, Apr 28, 2011 at 2:26 AM, aaron morton wrote: > I *think* that code is used when one node tells others via gossip it is > removing a token that is not it's own. The ode that receives information in > gossip does some work and then replies to the first node with a > REPLICATION_FINISHED message, which is the node I assume the error is > happening on. > > Have you been doing any moves / removes or additions or tokens/nodes? > > Thanks > Aaron > > On 28 Apr 2011, at 08:39, Alexis Lê-Quôc wrote: > > > Hi, > > > > I've been getting the following lately, every few seconds. > > > > 2011-04-27T20:21:18.299885+00:00 10.202.61.193 [MiscStage: 97] Error > > in ThreadPoolExecutor > > 2011-04-27T20:21:18.299885+00:00 10.202.61.193 java.lang.AssertionError > > 2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193 at > > > org.apache.cassandra.service.StorageService.confirmReplication(StorageService.java:1872) > > 2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193 at > > > org.apache.cassandra.streaming.ReplicationFinishedVerbHandler.doVerb(ReplicationFinishedVerbHandler.java:38) > > 2011-04-27T20:21:18.300047+00:00 10.202.61.193 10.202.61.193 at > > > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > > 2011-04-27T20:21:18.300047+00:00 10.202.61.193 10.202.61.193 at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > 2011-04-27T20:21:18.300055+00:00 10.202.61.193 10.202.61.193 at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > > 2011-04-27T20:21:18.300055+00:00 10.202.61.193 10.202.61.193 at > > java.lang.Thread.run(Thread.java:636) > > 2011-04-27T20:21:18.300555+00:00 10.202.61.193 [MiscStage: 97] Fatal > > exception in thread Thread[MiscStage:97,5,main] > > > > I see it coming from > > 32 public class ReplicationFinishedVerbHandler implements IVerbHandler > > 33 { > > 34 private static Logger logger = > > LoggerFactory.getLogger(ReplicationFinishedVerbHandler.class); > > 35 > > 36 public void doVerb(Message msg, String id) > > 37 { > > 38 StorageService.instance.confirmReplication(msg.getFrom()); > > 39 Message response = > > msg.getInternalReply(ArrayUtils.EMPTY_BYTE_ARRAY); > > 40 if (logger.isDebugEnabled()) > > 41 logger.debug("Replying to " + id + "@" + msg.getFrom()); > > 42 MessagingService.instance().sendReply(response, id, > msg.getFrom()); > > 43 } > > 44 } > > > > Before I dig deeper in the code, has anybody dealt with this before? > > > > Thanks, > > > > -- > > Alexis Lê-Quôc > >
Re: node restart taking too long
> the log file shows as follows, not sure what does 'Couldn't find cfId=1000' > means(google just returned useless results): Those should be the indication that the schema is wrong on the node. Reads and writes are being received from other nodes pertaining to column families it does not know about. I don't know, without investigation, why the instructions from the wiki don't work though. You did the procedure of restarting the node with the migrations/schema removed, right? -- / Peter Schuller (@scode on twitter)
Re: node restart taking too long
Can you post the complete Cassandra log starting with the initial start-up of the node after having removed schema/migrations? -- / Peter Schuller (@scode on twitter)
Re: Occasionally getting old data back with ConsistencyLevel.ALL
> Do you mean the cassandra log, or just logging in the script itself? The script itself. I.e, some "independent" verification that the line of code after the insert is in fact running, just in case there's some kind of silent failure. Sounds like you've tried to address it though with the E-Mail:s. I suppose it boils down to: Either there is something wrong in your environment/code, or Cassandra does have a bug. If the latter, it would probably be helpful if you could try to reproduce it in your environment in a way which can be shared - such as a script that does writes and reads back to confirm the write made it. Or maybe just adding more explicit logging to your script (even if it causes some log flooding) to "prove" that a write truly happened. -- / Peter Schuller (@scode on twitter)
Re: nodetool repair caused high disk space usage
> In our case they get created exclusively during repairs. Compactionstats > showed a huge number of sstable build compactions Do you have an indication that at least the disk space is in fact consistent with the amount of data being streamed between the nodes? I think you had 90 -> ~ 450 gig with RF=3, right? Still sounds like a lot assuming repairs are not running concurrently (and compactions are able to run after a repair before the next repair of a neighbor starts). -- / Peter Schuller (@scode on twitter)
Re: node restart taking too long
This means you should upgrade, because we've fixed bugs about ignoring deleted CFs since 0.7.4. On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu wrote: > the log file shows as follows, not sure what does 'Couldn't find cfId=1000' > means(google just returned useless results): > > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453) Found > table data in data directories. Consider using JMX to call > org.apache.cassandra.service.StorageService.loadSchemaFromYaml(). > INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50) > Creating new commitlog segment > /cassandra/commitlog/CommitLog-1313670197705.log > INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying > /cassandra/commitlog/CommitLog-1313670030512.log > INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished > reading /cassandra/commitlog/CommitLog-1313670030512.log > INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log replay > complete > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364) > Cassandra version: 0.7.4 > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365) Thrift > API version: 19.4.0 > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378) Loading > persisted ring state > INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414) Starting > up server gossip > INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048) > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1 operations) > INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157) > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations) > INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164) > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80 > bytes) > INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823 CompactionManager.java > (line 396) Compacting > [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')] > INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) Using > saved token 113427455640312821154458202477256070484 > INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048) > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2 operations) > INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157) > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations) > ERROR [MutationStage:28] 2011-08-18 07:23:18,246 RowMutationVerbHandler.java > (line 86) Error in row mutation > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find > cfId=1000 > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117) > at > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623) Node > /node1 has restarted, now UP again > ERROR [ReadStage:1] 2011-08-18 07:23:18,254 > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in > keyspace prjkeyspace > at > org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966) > at > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388) > at > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93) > at > org.apache.cassandra.db.SliceByNamesReadCommand.(SliceByNamesReadCommand.java:44) > at > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110) > at > org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122) > at > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67) > > > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton > wrote: >> >> Look in the logs to work find out why the migration did not get to node2. >> Otherwise yes you can drop those files. >> Cheers >> - >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote: >> >> just found out that changes via cassandra-cli, the schema change didn't >> reach node2. and node2 became unreachable >> I did as this >> document:http://wiki.apache.org/cassandra/FAQ#sche
Re: Re: Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process
Thanks for the help, this seems to have worked. Except that while adding the new node we added the same token to a different IP (operational script goofup) and brought the node up, so now the other nodes just had the message that a new IP had taken over the token. - So we brought it down and fixed it and it all came up fine. - ran removetoken did not finish - so ran removetoken force, that seemed to work - Cleaned up the nodes - Everything from the ring perspective appeared ok on all nodes - except for this error message (which based on some thread it seemed would go away) reported in this thread => http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/0-7-4-Replication-assertion-error-after-removetoken-removetoken-force-and-a-restart-td6311082.html - So I restarted this one node that was complaining (this was not the node that was replaced) - But once this node was restarted, the ring command on it showed the old single token IP (the one we removed). - So I am running the removetoken again , been running for about 2-3 hours now. the ring shows 113427455640312821154458202477256070485 10.xxx.0.184 Up Normal 829.73 GB 33.33% 0 10.xxx.0.185 Up Normal 576.09 GB 33.33% 56713727820156410577229101238628035241 10.xxx.0.189 Down Leaving 139.73 KB 0.00% 56713727820156410577229101238628035242 10.xxx.0.188 Up Normal 697.41 GB 33.33% 113427455640312821154458202477256070485 What are my choices here, how do I clean up the ring? The other 2 nodes show the ring fine (not even aware of 189) Thanks Anand On Fri, Aug 19, 2011 at 11:53 AM, Anand Somani wrote: > ok I will go with the IP change strategy and keep you posted. Not going to > manually copy any data, just bring up the node and let it bootstrap. > > Thanks > > > On Fri, Aug 19, 2011 at 11:46 AM, Peter Schuller < > peter.schul...@infidyne.com> wrote: > >> > (Yes, this should definitely be easier. Maybe the most generally >> > useful fix would be for Cassandra to support a node joining the wring >> > in "write-only" mode. This would be useful in other cases, such as >> > when you're trying to temporarily off-load a node by dissabling >> > gossip). >> >> I knew I had read discussions before: >> >> https://issues.apache.org/jira/browse/CASSANDRA-2568 >> >> -- >> / Peter Schuller (@scode on twitter) >> > >
Re: node restart taking too long
that could be the reason, I did nodetool repair(unfinished, data size increased 6 times bigger 30G vs 170G) and there should be some unclean sstables on that node. however upgrade it a tough work for me right now. could the nodetool scrub help? or decommission the node and join it again? On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis wrote: > This means you should upgrade, because we've fixed bugs about ignoring > deleted CFs since 0.7.4. > > On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu wrote: > > the log file shows as follows, not sure what does 'Couldn't find > cfId=1000' > > means(google just returned useless results): > > > > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453) > Found > > table data in data directories. Consider using JMX to call > > org.apache.cassandra.service.StorageService.loadSchemaFromYaml(). > > INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50) > > Creating new commitlog segment > > /cassandra/commitlog/CommitLog-1313670197705.log > > INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying > > /cassandra/commitlog/CommitLog-1313670030512.log > > INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished > > reading /cassandra/commitlog/CommitLog-1313670030512.log > > INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log replay > > complete > > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364) > > Cassandra version: 0.7.4 > > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365) > Thrift > > API version: 19.4.0 > > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378) > Loading > > persisted ring state > > INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414) > Starting > > up server gossip > > INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048) > > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1 > operations) > > INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157) > > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations) > > INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164) > > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80 > > bytes) > > INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823 > CompactionManager.java > > (line 396) Compacting > > > [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')] > > INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) Using > > saved token 113427455640312821154458202477256070484 > > INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048) > > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2 > operations) > > INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157) > > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations) > > ERROR [MutationStage:28] 2011-08-18 07:23:18,246 > RowMutationVerbHandler.java > > (line 86) Error in row mutation > > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't > find > > cfId=1000 > > at > > > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117) > > at > > > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380) > > at > > > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50) > > at > > > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > > at java.lang.Thread.run(Thread.java:636) > > INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623) > Node > > /node1 has restarted, now UP again > > ERROR [ReadStage:1] 2011-08-18 07:23:18,254 > > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor > > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in > > keyspace prjkeyspace > > at > > > org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966) > > at > > > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388) > > at > > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93) > > at > > > org.apache.cassandra.db.SliceByNamesReadCommand.(SliceByNamesReadCommand.java:44) > > at > > > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110) > > at > > > org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122) > > at > > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67) > >
Re: node restart taking too long
I'm not sure what problem you're trying to solve. The exception you pasted should stop once your clients are no longer trying to use the dropped CF. On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu wrote: > that could be the reason, I did nodetool repair(unfinished, data size > increased 6 times bigger 30G vs 170G) and there should be some unclean > sstables on that node. > however upgrade it a tough work for me right now. could the nodetool scrub > help? or decommission the node and join it again? > > On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis wrote: >> >> This means you should upgrade, because we've fixed bugs about ignoring >> deleted CFs since 0.7.4. >> >> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu wrote: >> > the log file shows as follows, not sure what does 'Couldn't find >> > cfId=1000' >> > means(google just returned useless results): >> > >> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453) >> > Found >> > table data in data directories. Consider using JMX to call >> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml(). >> > INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50) >> > Creating new commitlog segment >> > /cassandra/commitlog/CommitLog-1313670197705.log >> > INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying >> > /cassandra/commitlog/CommitLog-1313670030512.log >> > INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished >> > reading /cassandra/commitlog/CommitLog-1313670030512.log >> > INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log >> > replay >> > complete >> > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364) >> > Cassandra version: 0.7.4 >> > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365) >> > Thrift >> > API version: 19.4.0 >> > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378) >> > Loading >> > persisted ring state >> > INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414) >> > Starting >> > up server gossip >> > INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048) >> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1 >> > operations) >> > INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157) >> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations) >> > INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164) >> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80 >> > bytes) >> > INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823 >> > CompactionManager.java >> > (line 396) Compacting >> > >> > [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')] >> > INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) >> > Using >> > saved token 113427455640312821154458202477256070484 >> > INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048) >> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2 >> > operations) >> > INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157) >> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations) >> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246 >> > RowMutationVerbHandler.java >> > (line 86) Error in row mutation >> > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't >> > find >> > cfId=1000 >> > at >> > >> > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117) >> > at >> > >> > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380) >> > at >> > >> > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50) >> > at >> > >> > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) >> > at >> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >> > at >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >> > at java.lang.Thread.run(Thread.java:636) >> > INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623) >> > Node >> > /node1 has restarted, now UP again >> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254 >> > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor >> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in >> > keyspace prjkeyspace >> > at >> > >> > org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966) >> > at >> > >> > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388) >> > at >> > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93) >> > at >> > >> > org.apache.cassandra.db.Sl
Re: nodetool repair caused high disk space usage
> > Do you have an indication that at least the disk space is in fact > consistent with the amount of data being streamed between the nodes? I > think you had 90 -> ~ 450 gig with RF=3, right? Still sounds like a > lot assuming repairs are not running concurrently (and compactions are > able to run after a repair before the next repair of a neighbor > starts). > Hi Peter, When a repair was running on the 40GB keyspace I'd usually see range repairs for about up to a couple thousand ranges for each CF. If range = #keys then that's a very small amount of data being moved around. However, at the time, I hadn't noticed that there were multiple repairs running concurrently on the same nodes and on the neighbors so I suppose my experience is invalid for possibly finding a bug. But I suspect it will help someone out along the way because they'll have multiple repairs going on too and I have a much better understanding of what's going on myself. I've reloaded all my data in my cluster now, the load is 140GB on each node and I've been able to run a repair on each CF that comes out almost 100% consistent. I'm now starting to run the daily repair crons again to see if they go out of whack or not.