Mike,

Is that where you've bisected it to having been introduced?

I'll see what I can do, but doubt it, since we've long since upgraded prod
to 2.2.4 (and stage before that) and the tests I'm running were for a new
feature.

On Fri, 4 Mar 2016 03:54 Mike Heffner, <m...@librato.com> wrote:

> Emils,
>
> I realize this may be a big downgrade, but are you timeouts reproducible
> under Cassandra 2.1.4?
>
> Mike
>
> On Thu, Feb 25, 2016 at 10:34 AM, Emīls Šolmanis <emils.solma...@gmail.com
> > wrote:
>
>> Having had a read through the archives, I missed this at first, but this
>> seems to be *exactly* like what we're experiencing.
>>
>> http://www.mail-archive.com/user@cassandra.apache.org/msg46064.html
>>
>> Only difference is we're getting this for reads and using CQL, but the
>> behaviour is identical.
>>
>> On Thu, 25 Feb 2016 at 14:55 Emīls Šolmanis <emils.solma...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> We're having a problem with concurrent requests. It seems that whenever
>>> we try resolving more
>>> than ~ 15 queries at the same time, one or two get a read timeout and
>>> then succeed on a retry.
>>>
>>> We're running Cassandra 2.2.4 accessed via the 2.1.9 Datastax driver on
>>> AWS.
>>>
>>> What we've found while investigating:
>>>
>>>  * this is not db-wide. Trying the same pattern against another table
>>> everything works fine.
>>>  * it fails 1 or 2 requests regardless of how many are executed in
>>> parallel, i.e., it's still 1 or 2 when we ramp it up to ~ 120 concurrent
>>> requests and doesn't seem to scale up.
>>>  * the problem is consistently reproducible. It happens both under
>>> heavier load and when just firing off a single batch of requests for
>>> testing.
>>>  * tracing the faulty requests says everything is great. An example
>>> trace: https://gist.github.com/emilssolmanis/41e1e2ecdfd9a0569b1a
>>>  * the only peculiar thing in the logs is there's no acknowledgement of
>>> the request being accepted by the server, as seen in
>>> https://gist.github.com/emilssolmanis/242d9d02a6d8fb91da8a
>>>  * there's nothing funny in the timed out Cassandra node's logs around
>>> that time as far as I can tell, not even in the debug logs.
>>>
>>> Any ideas about what might be causing this, pointers to server config
>>> options, or how else we might debug this would be much appreciated.
>>>
>>> Kind regards,
>>> Emils
>>>
>>>
>
>
> --
>
>   Mike Heffner <m...@librato.com>
>   Librato, Inc.
>
>

Reply via email to