any suggestion? thanks! On Fri, Aug 19, 2011 at 10:26 PM, Yan Chunlu <springri...@gmail.com> wrote:
> the log file shows as follows, not sure what does 'Couldn't find cfId=1000' > means(google just returned useless results): > > > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453) > Found table data in data directories. Consider using JMX to call > org.apache.cassandra.service.StorageService.loadSchemaFromYaml(). > INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50) > Creating new commitlog segment > /cassandra/commitlog/CommitLog-1313670197705.log > INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying > /cassandra/commitlog/CommitLog-1313670030512.log > INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished > reading /cassandra/commitlog/CommitLog-1313670030512.log > INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log replay > complete > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364) > Cassandra version: 0.7.4 > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365) Thrift > API version: 19.4.0 > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378) Loading > persisted ring state > INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414) > Starting up server gossip > INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048) > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1 operations) > INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157) > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations) > INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164) > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80 > bytes) > INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823 CompactionManager.java > (line 396) Compacting > [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')] > INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) Using > saved token 113427455640312821154458202477256070484 > INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048) > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2 operations) > INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157) > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations) > ERROR [MutationStage:28] 2011-08-18 07:23:18,246 > RowMutationVerbHandler.java (line 86) Error in row mutation > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find > cfId=1000 > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117) > at > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623) Node > /node1 has restarted, now UP again > ERROR [ReadStage:1] 2011-08-18 07:23:18,254 > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in > keyspace prjkeyspace > at > org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966) > at > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388) > at > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93) > at > org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44) > at > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110) > at > org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122) > at > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67) > > > > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <aa...@thelastpickle.com>wrote: > >> Look in the logs to work find out why the migration did not get to node2. >> >> Otherwise yes you can drop those files. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote: >> >> just found out that changes via cassandra-cli, the schema change didn't >> reach node2. and node2 became unreachable.... >> >> I did as this document: >> http://wiki.apache.org/cassandra/FAQ#schema_disagreement >> >> but after that I just got two schema versons: >> >> >> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3] >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2] >> >> >> is that enough delete Schema* && Migrations* sstables and restart the >> node? >> >> >> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <springri...@gmail.com>wrote: >> >>> thanks a lot for all the help! I have gone through the steps and >>> successfully brought up the node2 :) >>> >>> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yulin...@gmail.com> wrote: >>> > Because the file only preserve the "key" of records, not the whole >>> record. >>> > Records for those saved key will be loaded into cassandra during the >>> startup >>> > of cassandra. >>> > >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <springri...@gmail.com> >>> wrote: >>> >> >>> >> but the data size in the saved_cache are relatively small: >>> >> >>> >> will that cause the load problem? >>> >> >>> >> ls -lh /cassandra/saved_caches/ >>> >> total 32M >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53 >>> >> cass-CommentSortsCache-KeyCache >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29 >>> >> cass-CommentSortsCache-RowCache >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 >>> cass-device_images-KeyCache >>> >> -rw-r--r-- 1 cass cass 33K 2011-08-12 18:51 cass-Hide-KeyCache >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50 >>> cass-SavesByAccount-KeyCache >>> >> -rw-r--r-- 1 cass cass 864 2011-08-12 19:49 cass-VotesByDay-KeyCache >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache >>> >> -rw-r--r-- 1 cass cass 28 2011-08-14 12:50 >>> >> system-HintsColumnFamily-KeyCache >>> >> -rw-r--r-- 1 cass cass 5 2011-08-14 12:50 >>> system-LocationInfo-KeyCache >>> >> -rw-r--r-- 1 cass cass 54 2011-08-13 13:30 >>> system-Migrations-KeyCache >>> >> -rw-r--r-- 1 cass cass 76 2011-08-13 13:30 system-Schema-KeyCache >>> >> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton < >>> aa...@thelastpickle.com> >>> >> wrote: >>> >> > If you have a node that cannot start up due to issues loading the >>> saved >>> >> > cache delete the files in the saved_cache directory before starting >>> it. >>> >> > >>> >> > The settings to save the row and key cache are per CF. You can >>> change >>> >> > them with an update column family statement via the CLI when >>> attached to any >>> >> > node. You may then want to check the saved_caches directory and >>> delete any >>> >> > files that are left (not sure if they are automatically deleted). >>> >> > >>> >> > i would recommend: >>> >> > - stop node 2 >>> >> > - delete it's saved_cache >>> >> > - make the schema change via another node >>> >> > - startup node 2 >>> >> > >>> >> > Cheers >>> >> > >>> >> > ----------------- >>> >> > Aaron Morton >>> >> > Freelance Cassandra Developer >>> >> > @aaronmorton >>> >> > http://www.thelastpickle.com >>> >> > >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote: >>> >> > >>> >> >> does this need to be cluster wide? or I could just modify the >>> caches >>> >> >> on one node? since I could not connect to the node with >>> >> >> cassandra-cli, it says "connection refused" >>> >> >> >>> >> >> >>> >> >> [default@unknown] connect node2/9160; >>> >> >> Exception connecting to node2/9160. Reason: Connection refused. >>> >> >> >>> >> >> >>> >> >> so if I change the cache size via other nodes, how could node2 be >>> >> >> notified the changing? kill cassandra and start it again could >>> make >>> >> >> it update the schema? >>> >> >> >>> >> >> >>> >> >> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer < >>> thol...@wetafx.co.nz> >>> >> >> wrote: >>> >> >>> Hi, >>> >> >>> >>> >> >>> yes, we saw exactly the same messages. We got rid of these by >>> doing >>> >> >>> the >>> >> >>> following: >>> >> >>> >>> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli >>> >> >>> * Kill Cassandra >>> >> >>> * Remove all files in the saved_caches directory >>> >> >>> * Start Cassandra >>> >> >>> * Slowly bring back row & key caches (if desired, we left them >>> off) >>> >> >>> >>> >> >>> Cheers, >>> >> >>> >>> >> >>> T. >>> >> >>> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote: >>> >> >>>> >>> >> >>>> I saw alot slicequeryfilter things if changed the log level to >>> >> >>>> DEBUG. >>> >> >>>> just >>> >> >>>> thought even bring up a new node will be faster than start the >>> old >>> >> >>>> one..... it >>> >> >>>> is wired >>> >> >>>> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line >>> 123) >>> >> >>>> collecting 0 of 2147483647: >>> 76616c7565:false:225@1313068845474382 >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line >>> 123) >>> >> >>>> collecting 0 of 2147483647: >>> 76616c7565:false:453@1310999270198313 >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line >>> 123) >>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827 >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line >>> 123) >>> >> >>>> collecting 0 of 2147483647: >>> 76616c7565:false:157@1313097239332314 >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line >>> 123) >>> >> >>>> collecting 0 of 2147483647: >>> 76616c7565:false:41729@1313190821826229 >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line >>> 123) >>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203 >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line >>> 123) >>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907 >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line >>> 123) >>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005 >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line >>> 123) >>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155 >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line >>> 123) >>> >> >>>> collecting 0 of 2147483647: >>> 76616c7565:false:621@1313192538616112 >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu < >>> springri...@gmail.com >>> >> >>>> <mailto:springri...@gmail.com>> wrote: >>> >> >>>> >>> >> >>>> but it seems the row cache is cluster wide, how will the >>> change >>> >> >>>> of row >>> >> >>>> cache affect the read speed? >>> >> >>>> >>> >> >>>> >>> >> >>>> On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis < >>> jbel...@gmail.com >>> >> >>>> <mailto:jbel...@gmail.com>> wrote: >>> >> >>>> >>> >> >>>> Or leave row cache enabled but disable cache saving (and >>> >> >>>> remove the >>> >> >>>> one already on disk). >>> >> >>>> >>> >> >>>> On Sun, Aug 14, 2011 at 5:05 PM, aaron morton >>> >> >>>> <aa...@thelastpickle.com >>> >> >>>> <mailto:aa...@thelastpickle.com>> wrote: >>> >> >>>> > INFO [main] 2011-08-14 09:24:52,198 >>> ColumnFamilyStore.java >>> >> >>>> (line 547) >>> >> >>>> > completed loading (1744370 ms; 200000 keys) row cache >>> for >>> >> >>>> COMMENT >>> >> >>>> > >>> >> >>>> > It's taking 29 minutes to load 200,000 rows in the row >>> >> >>>> cache. >>> >> >>>> Thats a >>> >> >>>> > pretty big row cache, I would suggest reducing or >>> disabling >>> >> >>>> it. >>> >> >>>> > Background >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra >>> >> >>>> > >>> >> >>>> > and server can not afford the load then crashed. after >>> come >>> >> >>>> back, >>> >> >>>> node 3 can >>> >> >>>> > not return for more than 96 hours >>> >> >>>> > >>> >> >>>> > Crashed how ? >>> >> >>>> > You may be seeing >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280 >>> >> >>>> > Watch nodetool compactionstats to see when the Merkle >>> tree >>> >> >>>> build >>> >> >>>> finishes >>> >> >>>> > and nodetool netstats to see which CF's are streaming. >>> >> >>>> > Cheers >>> >> >>>> > ----------------- >>> >> >>>> > Aaron Morton >>> >> >>>> > Freelance Cassandra Developer >>> >> >>>> > @aaronmorton >>> >> >>>> > http://www.thelastpickle.com >>> >> >>>> > On 15 Aug 2011, at 04:23, Yan Chunlu wrote: >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > I got 3 nodes and RF=3, when I repairing ndoe3, it >>> seems >>> >> >>>> alot >>> >> >>>> data >>> >> >>>> > generated. and server can not afford the load then >>> >> >>>> crashed. >>> >> >>>> > after come back, node 3 can not return for more than 96 >>> >> >>>> hours >>> >> >>>> > >>> >> >>>> > for 34GB data, the node 2 could restart and back online >>> >> >>>> within 1 >>> >> >>>> hour. >>> >> >>>> > >>> >> >>>> > I am not sure what's wrong with node3 and should I >>> restart >>> >> >>>> node >>> >> >>>> 3 again? >>> >> >>>> > thanks! >>> >> >>>> > >>> >> >>>> > Address Status State Load Owns >>> >> >>>> Token >>> >> >>>> > >>> >> >>>> > 113427455640312821154458202477256070484 >>> >> >>>> > node1 Up Normal 34.11 GB 33.33% 0 >>> >> >>>> > node2 Up Normal 31.44 GB 33.33% >>> >> >>>> > 56713727820156410577229101238628035242 >>> >> >>>> > node3 Down Normal 177.55 GB 33.33% >>> >> >>>> > 113427455640312821154458202477256070484 >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > the log shows it is still going on, not sure why it is >>> so >>> >> >>>> slow: >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java >>> >> >>>> (line >>> >> >>>> 154) >>> >> >>>> Opening >>> >> >>>> > /cassandra/data/COMMENT >>> >> >>>> > INFO [main] 2011-08-14 08:55:47,828 >>> ColumnFamilyStore.java >>> >> >>>> (line 275) >>> >> >>>> > reading saved cache >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache >>> >> >>>> > INFO [main] 2011-08-14 09:24:52,198 >>> ColumnFamilyStore.java >>> >> >>>> (line 547) >>> >> >>>> > completed loading (1744370 ms; 200000 keys) row cache >>> for >>> >> >>>> COMMENT >>> >> >>>> > INFO [main] 2011-08-14 09:24:52,299 >>> ColumnFamilyStore.java >>> >> >>>> (line 275) >>> >> >>>> > reading saved cache >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache >>> >> >>>> > INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 >>> >> >>>> CacheWriter.java (line >>> >> >>>> > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> -- >>> >> >>>> Jonathan Ellis >>> >> >>>> Project Chair, Apache Cassandra >>> >> >>>> co-founder of DataStax, the source for professional >>> Cassandra >>> >> >>>> support >>> >> >>>> http://www.datastax.com >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> >>> >>> >> >>> >>> >> > >>> >> > >>> > >>> > >>> >>> >> >> >