Re: node restart taking too long

Yan Chunlu Thu, 18 Aug 2011 02:14:39 -0700

thanks a lot for  all the help!  I have gone through the steps and
successfully brought up the node2 :)


On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yulin...@gmail.com> wrote:
> Because the file only preserve the "key" of records, not the whole record.
> Records for those saved key will be loaded into cassandra during the
startup
> of cassandra.
>
> On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <springri...@gmail.com> wrote:
>>
>> but the data size in the saved_cache are relatively small:
>>
>> will that cause the load problem?
>>
>>  ls  -lh  /cassandra/saved_caches/
>> total 32M
>> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
>> cass-CommentSortsCache-KeyCache
>> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
>> cass-CommentSortsCache-RowCache
>> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
>> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 cass-device_images-KeyCache
>> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
>> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
>> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
>> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
>> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
>> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
>> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
>> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50 cass-SavesByAccount-KeyCache
>> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
>> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
>> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
>> system-HintsColumnFamily-KeyCache
>> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50 system-LocationInfo-KeyCache
>> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30 system-Migrations-KeyCache
>> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
>>
>> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton <aa...@thelastpickle.com>
>> wrote:
>> > If you have a node that cannot start up due to issues loading the saved
>> > cache delete the files in the saved_cache directory before starting it.
>> >
>> > The settings to save the row and key cache are per CF. You can change
>> > them with an update column family statement via the CLI when attached
to any
>> > node. You may then want to check the saved_caches directory and delete
any
>> > files that are left (not sure if they are automatically deleted).
>> >
>> > i would recommend:
>> > - stop node 2
>> > - delete it's saved_cache
>> > - make the schema change via another node
>> > - startup node 2
>> >
>> > Cheers
>> >
>> > -----------------
>> > Aaron Morton
>> > Freelance Cassandra Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> >
>> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>> >
>> >> does this need to be cluster wide? or I could just modify the caches
>> >> on one node?   since I could not connect to the node with
>> >> cassandra-cli, it says "connection refused"
>> >>
>> >>
>> >> [default@unknown] connect node2/9160;
>> >> Exception connecting to node2/9160. Reason: Connection refused.
>> >>
>> >>
>> >> so if I change the cache size via other nodes, how could node2 be
>> >> notified the changing?    kill cassandra and start it again could make
>> >> it update the schema?
>> >>
>> >>
>> >>
>> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer <thol...@wetafx.co.nz>
>> >> wrote:
>> >>> Hi,
>> >>>
>> >>> yes, we saw exactly the same messages. We got rid of these by doing
>> >>> the
>> >>> following:
>> >>>
>> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
>> >>> * Kill Cassandra
>> >>> * Remove all files in the saved_caches directory
>> >>> * Start Cassandra
>> >>> * Slowly bring back row & key caches (if desired, we left them off)
>> >>>
>> >>> Cheers,
>> >>>
>> >>>        T.
>> >>>
>> >>> On 16/08/11 23:35, Yan Chunlu wrote:
>> >>>>
>> >>>>  I saw alot slicequeryfilter things if changed the log level to
>> >>>> DEBUG.
>> >>>>  just
>> >>>> thought even bring up a new node will be faster than start the old
>> >>>> one..... it
>> >>>> is wired
>> >>>>
>> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
>> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
>> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
>> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
>> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
>> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
>> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
>> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
>> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
>> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <springri...@gmail.com
>> >>>> <mailto:springri...@gmail.com>> wrote:
>> >>>>
>> >>>>    but it seems the row cache is cluster wide, how will  the change
>> >>>> of row
>> >>>>    cache affect the read speed?
>> >>>>
>> >>>>
>> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <
jbel...@gmail.com
>> >>>>    <mailto:jbel...@gmail.com>> wrote:
>> >>>>
>> >>>>        Or leave row cache enabled but disable cache saving (and
>> >>>> remove the
>> >>>>        one already on disk).
>> >>>>
>> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>> >>>> <aa...@thelastpickle.com
>> >>>>        <mailto:aa...@thelastpickle.com>> wrote:
>> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
ColumnFamilyStore.java
>> >>>> (line 547)
>> >>>>         > completed loading (1744370 ms; 200000 keys) row cache for
>> >>>> COMMENT
>> >>>>         >
>> >>>>         > It's taking 29 minutes to load 200,000 rows in the  row
>> >>>> cache.
>> >>>> Thats a
>> >>>>         > pretty big row cache, I would suggest reducing or
disabling
>> >>>> it.
>> >>>>         > Background
>> >>>>
>> >>>>
>> >>>>
http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>> >>>>         >
>> >>>>         > and server can not afford the load then crashed. after
come
>> >>>> back,
>> >>>>        node 3 can
>> >>>>         > not return for more than 96 hours
>> >>>>         >
>> >>>>         > Crashed how ?
>> >>>>         > You may be seeing
>> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>> >>>>         > Watch nodetool compactionstats to see when the Merkle tree
>> >>>> build
>> >>>>        finishes
>> >>>>         > and nodetool netstats to see which CF's are streaming.
>> >>>>         > Cheers
>> >>>>         > -----------------
>> >>>>         > Aaron Morton
>> >>>>         > Freelance Cassandra Developer
>> >>>>         > @aaronmorton
>> >>>>         > http://www.thelastpickle.com
>> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>> >>>>         >
>> >>>>         >
>> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it seems
>> >>>> alot
>> >>>> data
>> >>>>         > generated.  and server can not afford the load then
>> >>>> crashed.
>> >>>>         > after come back, node 3 can not return for more than 96
>> >>>> hours
>> >>>>         >
>> >>>>         > for 34GB data, the node 2 could restart and back online
>> >>>> within 1
>> >>>> hour.
>> >>>>         >
>> >>>>         > I am not sure what's wrong with node3 and should I restart
>> >>>> node
>> >>>> 3 again?
>> >>>>         > thanks!
>> >>>>         >
>> >>>>         > Address         Status State   Load            Owns
>> >>>>  Token
>> >>>>         >
>> >>>>         > 113427455640312821154458202477256070484
>> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
>> >>>>         > node2     Up     Normal  31.44 GB        33.33%
>> >>>>         > 56713727820156410577229101238628035242
>> >>>>         > node3     Down   Normal  177.55 GB       33.33%
>> >>>>         > 113427455640312821154458202477256070484
>> >>>>         >
>> >>>>         >
>> >>>>         > the log shows it is still going on, not sure why it is so
>> >>>> slow:
>> >>>>         >
>> >>>>         >
>> >>>>         >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java
>> >>>> (line
>> >>>> 154)
>> >>>>        Opening
>> >>>>         > /cassandra/data/COMMENT
>> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
ColumnFamilyStore.java
>> >>>> (line 275)
>> >>>>         > reading saved cache
>> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
ColumnFamilyStore.java
>> >>>> (line 547)
>> >>>>         > completed loading (1744370 ms; 200000 keys) row cache for
>> >>>> COMMENT
>> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
ColumnFamilyStore.java
>> >>>> (line 275)
>> >>>>         > reading saved cache
>> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
>> >>>>        CacheWriter.java (line
>> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>> >>>>         >
>> >>>>         >
>> >>>>         >
>> >>>>         >
>> >>>>         >
>> >>>>         >
>> >>>>
>> >>>>
>> >>>>
>> >>>>        --
>> >>>>        Jonathan Ellis
>> >>>>        Project Chair, Apache Cassandra
>> >>>>        co-founder of DataStax, the source for professional Cassandra
>> >>>> support
>> >>>>        http://www.datastax.com
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >
>> >
>
>

Re: node restart taking too long

Reply via email to