On Tue, Apr 12, 2011 at 12:26 AM, aaron morton <aa...@thelastpickle.com>wrote:

> The reset interval clears the latency tracked for each node so a bad node
> will be read from again. The scores for each node are then updated every
> 100ms (default) using the last 100 responses from a node.
>
> How long does the bad performance last for?
>
Only a few seconds and but there are a lot of read requests during this time

>
> What CL are you reading at ? At Quorum with RF 4 the read request will be
> sent to 3 nodes, ordered by proximity and wellness according to the dynamic
> snitch. (for background recent discussion on dynamic snitch
> http://www.mail-archive.com/user@cassandra.apache.org/msg12089.html)
>
I am reading with CL of ONE,  read_repair_chance=0.33, RackInferringSnitch
and keys_cached = rows_cached = 0

>
> You can take a look at the weights and timings used by the DynamicSnitch in
> JConsole under o.a.c.db.DynamicSnitchEndpoint . Also at DEBUG log level you
> will be able to see which nodes the request is sent to.
>
Everything looks OK. The weights are around 3 for the nodes in the same data
center and around 5 for the others. I will turn on the DEBUG level to see if
I can find more info.

>
> My guess is the DynamicSnitch is doing the right thing and the slow down is
> a node with a problem getting back into the list of nodes used for your
> read. It's then moved down the list as it's bad performance is noticed.
>
Looking the DynamicSnitch MBean I don't see any problems with any of the
nodes. My guess is that during the reset time there are reads that are sent
to the other data center.

>
> Hope that helps
> Aaron
>

Shimi


>
> On 12 Apr 2011, at 01:28, shimi wrote:
>
> I finally upgraded 0.6.x to 0.7.4.  The nodes are running with the new
> version for several days across 2 data centers.
> I noticed that the read time in some of the nodes increase by x50-60 every
> ten minutes.
> There was no indication in the logs for something that happen at the same
> time. The only thing that I know that is running every 10 minutes is
> the dynamic snitch reset.
> So I changed dynamic_snitch_reset_interval_in_ms to 20 minutes and now I
> have the problem once in every 20 minutes.
>
> I am running all nodes with:
> replica_placement_strategy:
> org.apache.cassandra.locator.NetworkTopologyStrategy
>       strategy_options:
>         DC1 : 2
>         DC2 : 2
>       replication_factor: 4
>
> (DC1 and DC2 are taken from the ips)
> Does anyone familiar with this kind of behavior?
>
> Shimi
>
>
>

Reply via email to