that could be the reason, I did nodetool repair(unfinished, data size increased 6 times bigger 30G vs 170G) and there should be some unclean sstables on that node.
however upgrade it a tough work for me right now. could the nodetool scrub help? or decommission the node and join it again? On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > This means you should upgrade, because we've fixed bugs about ignoring > deleted CFs since 0.7.4. > > On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <springri...@gmail.com> wrote: > > the log file shows as follows, not sure what does 'Couldn't find > cfId=1000' > > means(google just returned useless results): > > > > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453) > Found > > table data in data directories. Consider using JMX to call > > org.apache.cassandra.service.StorageService.loadSchemaFromYaml(). > > INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50) > > Creating new commitlog segment > > /cassandra/commitlog/CommitLog-1313670197705.log > > INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying > > /cassandra/commitlog/CommitLog-1313670030512.log > > INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished > > reading /cassandra/commitlog/CommitLog-1313670030512.log > > INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log replay > > complete > > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364) > > Cassandra version: 0.7.4 > > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365) > Thrift > > API version: 19.4.0 > > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378) > Loading > > persisted ring state > > INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414) > Starting > > up server gossip > > INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048) > > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1 > operations) > > INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157) > > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations) > > INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164) > > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80 > > bytes) > > INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823 > CompactionManager.java > > (line 396) Compacting > > > [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')] > > INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) Using > > saved token 113427455640312821154458202477256070484 > > INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048) > > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2 > operations) > > INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157) > > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations) > > ERROR [MutationStage:28] 2011-08-18 07:23:18,246 > RowMutationVerbHandler.java > > (line 86) Error in row mutation > > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't > find > > cfId=1000 > > at > > > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117) > > at > > > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380) > > at > > > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50) > > at > > > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > > at java.lang.Thread.run(Thread.java:636) > > INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623) > Node > > /node1 has restarted, now UP again > > ERROR [ReadStage:1] 2011-08-18 07:23:18,254 > > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor > > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in > > keyspace prjkeyspace > > at > > > org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966) > > at > > > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388) > > at > > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93) > > at > > > org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44) > > at > > > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110) > > at > > > org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122) > > at > > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67) > > > > > > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <aa...@thelastpickle.com> > > wrote: > >> > >> Look in the logs to work find out why the migration did not get to > node2. > >> Otherwise yes you can drop those files. > >> Cheers > >> ----------------- > >> Aaron Morton > >> Freelance Cassandra Developer > >> @aaronmorton > >> http://www.thelastpickle.com > >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote: > >> > >> just found out that changes via cassandra-cli, the schema change didn't > >> reach node2. and node2 became unreachable.... > >> I did as this > >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement > >> but after that I just got two schema versons: > >> > >> > >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3] > >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2] > >> > >> is that enough delete Schema* && Migrations* sstables and restart the > >> node? > >> > >> > >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <springri...@gmail.com> > wrote: > >>> > >>> thanks a lot for all the help! I have gone through the steps and > >>> successfully brought up the node2 :) > >>> > >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yulin...@gmail.com> > wrote: > >>> > Because the file only preserve the "key" of records, not the whole > >>> > record. > >>> > Records for those saved key will be loaded into cassandra during the > >>> > startup > >>> > of cassandra. > >>> > > >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <springri...@gmail.com> > >>> > wrote: > >>> >> > >>> >> but the data size in the saved_cache are relatively small: > >>> >> > >>> >> will that cause the load problem? > >>> >> > >>> >> ls -lh /cassandra/saved_caches/ > >>> >> total 32M > >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53 > >>> >> cass-CommentSortsCache-KeyCache > >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29 > >>> >> cass-CommentSortsCache-RowCache > >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 > cass-CommentVote-KeyCache > >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 > >>> >> cass-device_images-KeyCache > >>> >> -rw-r--r-- 1 cass cass 33K 2011-08-12 18:51 cass-Hide-KeyCache > >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache > >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 > cass-LinksByUrl-KeyCache > >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache > >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache > >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache > >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache > >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50 > >>> >> cass-SavesByAccount-KeyCache > >>> >> -rw-r--r-- 1 cass cass 864 2011-08-12 19:49 > cass-VotesByDay-KeyCache > >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 > cass-VotesByLink-KeyCache > >>> >> -rw-r--r-- 1 cass cass 28 2011-08-14 12:50 > >>> >> system-HintsColumnFamily-KeyCache > >>> >> -rw-r--r-- 1 cass cass 5 2011-08-14 12:50 > >>> >> system-LocationInfo-KeyCache > >>> >> -rw-r--r-- 1 cass cass 54 2011-08-13 13:30 > >>> >> system-Migrations-KeyCache > >>> >> -rw-r--r-- 1 cass cass 76 2011-08-13 13:30 system-Schema-KeyCache > >>> >> > >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton > >>> >> <aa...@thelastpickle.com> > >>> >> wrote: > >>> >> > If you have a node that cannot start up due to issues loading the > >>> >> > saved > >>> >> > cache delete the files in the saved_cache directory before > starting > >>> >> > it. > >>> >> > > >>> >> > The settings to save the row and key cache are per CF. You can > >>> >> > change > >>> >> > them with an update column family statement via the CLI when > >>> >> > attached to any > >>> >> > node. You may then want to check the saved_caches directory and > >>> >> > delete any > >>> >> > files that are left (not sure if they are automatically deleted). > >>> >> > > >>> >> > i would recommend: > >>> >> > - stop node 2 > >>> >> > - delete it's saved_cache > >>> >> > - make the schema change via another node > >>> >> > - startup node 2 > >>> >> > > >>> >> > Cheers > >>> >> > > >>> >> > ----------------- > >>> >> > Aaron Morton > >>> >> > Freelance Cassandra Developer > >>> >> > @aaronmorton > >>> >> > http://www.thelastpickle.com > >>> >> > > >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote: > >>> >> > > >>> >> >> does this need to be cluster wide? or I could just modify the > >>> >> >> caches > >>> >> >> on one node? since I could not connect to the node with > >>> >> >> cassandra-cli, it says "connection refused" > >>> >> >> > >>> >> >> > >>> >> >> [default@unknown] connect node2/9160; > >>> >> >> Exception connecting to node2/9160. Reason: Connection refused. > >>> >> >> > >>> >> >> > >>> >> >> so if I change the cache size via other nodes, how could node2 be > >>> >> >> notified the changing? kill cassandra and start it again could > >>> >> >> make > >>> >> >> it update the schema? > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer > >>> >> >> <thol...@wetafx.co.nz> > >>> >> >> wrote: > >>> >> >>> Hi, > >>> >> >>> > >>> >> >>> yes, we saw exactly the same messages. We got rid of these by > >>> >> >>> doing > >>> >> >>> the > >>> >> >>> following: > >>> >> >>> > >>> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli > >>> >> >>> * Kill Cassandra > >>> >> >>> * Remove all files in the saved_caches directory > >>> >> >>> * Start Cassandra > >>> >> >>> * Slowly bring back row & key caches (if desired, we left them > >>> >> >>> off) > >>> >> >>> > >>> >> >>> Cheers, > >>> >> >>> > >>> >> >>> T. > >>> >> >>> > >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote: > >>> >> >>>> > >>> >> >>>> I saw alot slicequeryfilter things if changed the log level to > >>> >> >>>> DEBUG. > >>> >> >>>> just > >>> >> >>>> thought even bring up a new node will be faster than start the > >>> >> >>>> old > >>> >> >>>> one..... it > >>> >> >>>> is wired > >>> >> >>>> > >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java > (line > >>> >> >>>> 123) > >>> >> >>>> collecting 0 of 2147483647: > 76616c7565:false:225@1313068845474382 > >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java > (line > >>> >> >>>> 123) > >>> >> >>>> collecting 0 of 2147483647: > 76616c7565:false:453@1310999270198313 > >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java > (line > >>> >> >>>> 123) > >>> >> >>>> collecting 0 of 2147483647: > 76616c7565:false:26@1313199902088827 > >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java > (line > >>> >> >>>> 123) > >>> >> >>>> collecting 0 of 2147483647: > 76616c7565:false:157@1313097239332314 > >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java > (line > >>> >> >>>> 123) > >>> >> >>>> collecting 0 of 2147483647: > >>> >> >>>> 76616c7565:false:41729@1313190821826229 > >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java > (line > >>> >> >>>> 123) > >>> >> >>>> collecting 0 of 2147483647: > 76616c7565:false:6@1313174157301203 > >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java > (line > >>> >> >>>> 123) > >>> >> >>>> collecting 0 of 2147483647: > 76616c7565:false:98@1312011362250907 > >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java > (line > >>> >> >>>> 123) > >>> >> >>>> collecting 0 of 2147483647: > 76616c7565:false:42@1313201711997005 > >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java > (line > >>> >> >>>> 123) > >>> >> >>>> collecting 0 of 2147483647: > 76616c7565:false:96@1312939986190155 > >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java > (line > >>> >> >>>> 123) > >>> >> >>>> collecting 0 of 2147483647: > 76616c7565:false:621@1313192538616112 > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu > >>> >> >>>> <springri...@gmail.com > >>> >> >>>> <mailto:springri...@gmail.com>> wrote: > >>> >> >>>> > >>> >> >>>> but it seems the row cache is cluster wide, how will the > >>> >> >>>> change > >>> >> >>>> of row > >>> >> >>>> cache affect the read speed? > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis > >>> >> >>>> <jbel...@gmail.com > >>> >> >>>> <mailto:jbel...@gmail.com>> wrote: > >>> >> >>>> > >>> >> >>>> Or leave row cache enabled but disable cache saving (and > >>> >> >>>> remove the > >>> >> >>>> one already on disk). > >>> >> >>>> > >>> >> >>>> On Sun, Aug 14, 2011 at 5:05 PM, aaron morton > >>> >> >>>> <aa...@thelastpickle.com > >>> >> >>>> <mailto:aa...@thelastpickle.com>> wrote: > >>> >> >>>> > INFO [main] 2011-08-14 09:24:52,198 > >>> >> >>>> ColumnFamilyStore.java > >>> >> >>>> (line 547) > >>> >> >>>> > completed loading (1744370 ms; 200000 keys) row cache > >>> >> >>>> for > >>> >> >>>> COMMENT > >>> >> >>>> > > >>> >> >>>> > It's taking 29 minutes to load 200,000 rows in the > row > >>> >> >>>> cache. > >>> >> >>>> Thats a > >>> >> >>>> > pretty big row cache, I would suggest reducing or > >>> >> >>>> disabling > >>> >> >>>> it. > >>> >> >>>> > Background > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra > >>> >> >>>> > > >>> >> >>>> > and server can not afford the load then crashed. > after > >>> >> >>>> come > >>> >> >>>> back, > >>> >> >>>> node 3 can > >>> >> >>>> > not return for more than 96 hours > >>> >> >>>> > > >>> >> >>>> > Crashed how ? > >>> >> >>>> > You may be seeing > >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280 > >>> >> >>>> > Watch nodetool compactionstats to see when the Merkle > >>> >> >>>> tree > >>> >> >>>> build > >>> >> >>>> finishes > >>> >> >>>> > and nodetool netstats to see which CF's are > streaming. > >>> >> >>>> > Cheers > >>> >> >>>> > ----------------- > >>> >> >>>> > Aaron Morton > >>> >> >>>> > Freelance Cassandra Developer > >>> >> >>>> > @aaronmorton > >>> >> >>>> > http://www.thelastpickle.com > >>> >> >>>> > On 15 Aug 2011, at 04:23, Yan Chunlu wrote: > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > I got 3 nodes and RF=3, when I repairing ndoe3, it > >>> >> >>>> seems > >>> >> >>>> alot > >>> >> >>>> data > >>> >> >>>> > generated. and server can not afford the load then > >>> >> >>>> crashed. > >>> >> >>>> > after come back, node 3 can not return for more than > 96 > >>> >> >>>> hours > >>> >> >>>> > > >>> >> >>>> > for 34GB data, the node 2 could restart and back > online > >>> >> >>>> within 1 > >>> >> >>>> hour. > >>> >> >>>> > > >>> >> >>>> > I am not sure what's wrong with node3 and should I > >>> >> >>>> restart > >>> >> >>>> node > >>> >> >>>> 3 again? > >>> >> >>>> > thanks! > >>> >> >>>> > > >>> >> >>>> > Address Status State Load Owns > >>> >> >>>> Token > >>> >> >>>> > > >>> >> >>>> > 113427455640312821154458202477256070484 > >>> >> >>>> > node1 Up Normal 34.11 GB 33.33% 0 > >>> >> >>>> > node2 Up Normal 31.44 GB 33.33% > >>> >> >>>> > 56713727820156410577229101238628035242 > >>> >> >>>> > node3 Down Normal 177.55 GB 33.33% > >>> >> >>>> > 113427455640312821154458202477256070484 > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > the log shows it is still going on, not sure why it > is > >>> >> >>>> so > >>> >> >>>> slow: > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > INFO [main] 2011-08-14 08:55:47,734 > SSTableReader.java > >>> >> >>>> (line > >>> >> >>>> 154) > >>> >> >>>> Opening > >>> >> >>>> > /cassandra/data/COMMENT > >>> >> >>>> > INFO [main] 2011-08-14 08:55:47,828 > >>> >> >>>> ColumnFamilyStore.java > >>> >> >>>> (line 275) > >>> >> >>>> > reading saved cache > >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache > >>> >> >>>> > INFO [main] 2011-08-14 09:24:52,198 > >>> >> >>>> ColumnFamilyStore.java > >>> >> >>>> (line 547) > >>> >> >>>> > completed loading (1744370 ms; 200000 keys) row cache > >>> >> >>>> for > >>> >> >>>> COMMENT > >>> >> >>>> > INFO [main] 2011-08-14 09:24:52,299 > >>> >> >>>> ColumnFamilyStore.java > >>> >> >>>> (line 275) > >>> >> >>>> > reading saved cache > >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache > >>> >> >>>> > INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 > >>> >> >>>> CacheWriter.java (line > >>> >> >>>> > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> -- > >>> >> >>>> Jonathan Ellis > >>> >> >>>> Project Chair, Apache Cassandra > >>> >> >>>> co-founder of DataStax, the source for professional > >>> >> >>>> Cassandra > >>> >> >>>> support > >>> >> >>>> http://www.datastax.com > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>> > >>> >> >>> > >>> >> > > >>> >> > > >>> > > >>> > > >>> > >> > >> > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >