The reset interval clears the latency tracked for each node so a bad node will 
be read from again. The scores for each node are then updated every 100ms 
(default) using the last 100 responses from a node. 

How long does the bad performance last for?

What CL are you reading at ? At Quorum with RF 4 the read request will be sent 
to 3 nodes, ordered by proximity and wellness according to the dynamic snitch. 
(for background recent discussion on dynamic snitch 
http://www.mail-archive.com/user@cassandra.apache.org/msg12089.html)

You can take a look at the weights and timings used by the DynamicSnitch in 
JConsole under o.a.c.db.DynamicSnitchEndpoint . Also at DEBUG log level you 
will be able to see which nodes the request is sent to. 

My guess is the DynamicSnitch is doing the right thing and the slow down is a 
node with a problem getting back into the list of nodes used for your read. 
It's then moved down the list as it's bad performance is noticed.

Hope that helps
Aaron
 

On 12 Apr 2011, at 01:28, shimi wrote:

> I finally upgraded 0.6.x to 0.7.4.  The nodes are running with the new 
> version for several days across 2 data centers.
> I noticed that the read time in some of the nodes increase by x50-60 every 
> ten minutes.
> There was no indication in the logs for something that happen at the same 
> time. The only thing that I know that is running every 10 minutes is the 
> dynamic snitch reset.
> So I changed dynamic_snitch_reset_interval_in_ms to 20 minutes and now I have 
> the problem once in every 20 minutes.
> 
> I am running all nodes with:
> replica_placement_strategy: 
> org.apache.cassandra.locator.NetworkTopologyStrategy
>       strategy_options:
>         DC1 : 2
>         DC2 : 2
>       replication_factor: 4
> 
> (DC1 and DC2 are taken from the ips)
> Does anyone familiar with this kind of behavior?
> 
> Shimi
> 

Reply via email to