On further analysis, this issue happens only on 1 table in the KS which has
the max reads.

@Atul, I will look at system health, but didnt see anything standing out
from GC logs. (using JDK 1.8_92 with G1GC).

@Patrick , could you please elaborate the "mismatch on node count + RF"
part.

On Tue, Aug 30, 2016 at 5:35 PM, Atul Saroha <atul.sar...@snapdeal.com>
wrote:

> There could be many reasons for this if it is intermittent. CPU usage +
> I/O wait status. As read are I/O intensive, your IOPS requirement should be
> met that time load. Heap issue if CPU is busy for GC only. Network health
> could be the reason. So better to look system health during that time when
> it comes.
>
> ------------------------------------------------------------
> ---------------------------------------------------------
> Atul Saroha
> *Lead Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>
> On Tue, Aug 30, 2016 at 5:10 PM, Joseph Tech <jaalex.t...@gmail.com>
> wrote:
>
>> Hi Patrick,
>>
>> The nodetool status shows all nodes up and normal now. From OpsCenter
>> "Event Log" , there are some nodes reported as being down/up etc. during
>> the timeframe of timeout, but these are Search workload nodes from the
>> remote (non-local) DC. The RF is 3 and there are 9 nodes per DC.
>>
>> Thanks,
>> Joseph
>>
>> On Mon, Aug 29, 2016 at 11:07 PM, Patrick McFadin <pmcfa...@gmail.com>
>> wrote:
>>
>>> You aren't achieving quorum on your reads as the error is explains. That
>>> means you either have some nodes down or your topology is not matching up.
>>> The fact you are using LOCAL_QUORUM might point to a datacenter mis-match
>>> on node count + RF.
>>>
>>> What does your nodetool status look like?
>>>
>>> Patrick
>>>
>>> On Mon, Aug 29, 2016 at 10:14 AM, Joseph Tech <jaalex.t...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We recently started getting intermittent timeouts on primary key
>>>> queries (select * from table where key=<key>)
>>>>
>>>> The error is : com.datastax.driver.core.exceptions.ReadTimeoutException:
>>>> Cassandra timeout during read query at consistency LOCAL_QUORUM (2
>>>> responses were required but only 1 replica
>>>> a responded)
>>>>
>>>> The same query would work fine when tried directly from cqlsh. There
>>>> are no indications in system.log for the table in question, though there
>>>> were compactions in progress for tables in another keyspace which is more
>>>> frequently accessed.
>>>>
>>>> My understanding is that the chances of primary key queries timing out
>>>> is very minimal. Please share the possible reasons / ways to debug this
>>>> issue.
>>>>
>>>> We are using Cassandra 2.1 (DSE 4.8.7).
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to