Re: Cassandra 2.0.2 - Frequent Read timeouts and delays in replication on 3-node cluster in AWS VPC

Steven A Robenalt Thu, 21 Nov 2013 10:12:30 -0800

Looks like the read timeouts were a result of a bug that will be fixed in
2.0.3.


I found this question on the Datastax Java Driver mailing list:
https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/ao1ohSLpjRM

which led me to:
https://issues.apache.org/jira/browse/CASSANDRA-6299

I built and deployed a 2.0.3 snapshot this morning, which includes this
fix, and my cluster is now behaving normally (no read timeouts so far).



On Tue, Nov 19, 2013 at 4:55 PM, Steven A Robenalt <srobe...@stanford.edu>wrote:

> It seems that with NTP properly configured, the replication is now working
> as expected, but there are still a lot of read timeouts. The
> troubleshooting continues...
>
>
> On Tue, Nov 19, 2013 at 8:53 AM, Steven A Robenalt 
> <srobe...@stanford.edu>wrote:
>
>> Thanks Michael, I will try that out.
>>
>>
>> On Tue, Nov 19, 2013 at 5:28 AM, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>>> We had a similar problem when our nodes could not sync using ntp due to
>>> VPC ACL settings. -ml
>>>
>>>
>>> On Mon, Nov 18, 2013 at 8:49 PM, Steven A Robenalt <
>>> srobe...@stanford.edu> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am attempting to bring up our new app on a 3-node cluster and am
>>>> having problems with frequent read timeouts and slow inter-node
>>>> replication. Initially, these errors were mostly occurring in our app
>>>> server, affecting 0.02%-1.0% of our queries in an otherwise unloaded
>>>> cluster. No exceptions were logged on the servers in this case, and reads
>>>> in a single node environment with the same code and client driver virtually
>>>> never see exceptions like this, so I suspect problems with the
>>>> inter-cluster communication between nodes.
>>>>
>>>> The 3 nodes are deployed in a single AWS VPC, and are all in a common
>>>> subnet. The Cassandra version is 2.0.2 following an upgrade this past
>>>> weekend due to NPEs in a secondary index that were affecting certain
>>>> queries under 2.0.1. The servers are m1.large instances running AWS Linux
>>>> and Oracle JDK7u40. The first 2 nodes in the cluster are the seed nodes.
>>>> All database contents are CQL tables with replication factor of 3, and the
>>>> application is Java-based, using the latest Datastax 2.0.0-rc1 Java Driver.
>>>>
>>>> In testing with the application, I noticed this afternoon that the
>>>> contents of the 3 nodes differed in their respective copies of the same
>>>> table for newly written data, for time periods exceeding several minutes,
>>>> as reported by cqlsh on each node. Specifying different hosts from the same
>>>> server using cqlsh also exhibited timeouts on multiple attempts to connect,
>>>> and on executing some queries, though they eventually succeeded in all
>>>> cases, and eventually the data in all nodes was fully replicated.
>>>>
>>>> The AWS servers have a security group with only ports 22, 7000, 9042,
>>>> and 9160 open.
>>>>
>>>> At this time, it seems that either I am still missing something in my
>>>> cluster configuration, or maybe there are other ports that are needed for
>>>> inter-node communication.
>>>>
>>>> Any advice/suggestions would be appreciated.
>>>>
>>>>
>>>>
>>>> --
>>>> Steve Robenalt
>>>> Software Architect
>>>> HighWire | Stanford University
>>>> 425 Broadway St, Redwood City, CA 94063
>>>>
>>>> srobe...@stanford.edu
>>>> http://highwire.stanford.edu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Steve Robenalt
>> Software Architect
>> HighWire | Stanford University
>> 425 Broadway St, Redwood City, CA 94063
>>
>> srobe...@stanford.edu
>> http://highwire.stanford.edu
>>
>>
>>
>>
>>
>>
>
>
> --
> Steve Robenalt
> Software Architect
> HighWire | Stanford University
> 425 Broadway St, Redwood City, CA 94063
>
> srobe...@stanford.edu
> http://highwire.stanford.edu
>
>
>
>
>
>


-- 
Steve Robenalt
Software Architect
HighWire | Stanford University
425 Broadway St, Redwood City, CA 94063

srobe...@stanford.edu
http://highwire.stanford.edu

Re: Cassandra 2.0.2 - Frequent Read timeouts and delays in replication on 3-node cluster in AWS VPC

Reply via email to