On further analysis, this issue happens only on 1 table in the KS which has the max reads.
@Atul, I will look at system health, but didnt see anything standing out from GC logs. (using JDK 1.8_92 with G1GC). @Patrick , could you please elaborate the "mismatch on node count + RF" part. On Tue, Aug 30, 2016 at 5:35 PM, Atul Saroha <atul.sar...@snapdeal.com> wrote: > There could be many reasons for this if it is intermittent. CPU usage + > I/O wait status. As read are I/O intensive, your IOPS requirement should be > met that time load. Heap issue if CPU is busy for GC only. Network health > could be the reason. So better to look system health during that time when > it comes. > > ------------------------------------------------------------ > --------------------------------------------------------- > Atul Saroha > *Lead Software Engineer* > *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369 > Plot # 362, ASF Centre - Tower A, Udyog Vihar, > Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA > > On Tue, Aug 30, 2016 at 5:10 PM, Joseph Tech <jaalex.t...@gmail.com> > wrote: > >> Hi Patrick, >> >> The nodetool status shows all nodes up and normal now. From OpsCenter >> "Event Log" , there are some nodes reported as being down/up etc. during >> the timeframe of timeout, but these are Search workload nodes from the >> remote (non-local) DC. The RF is 3 and there are 9 nodes per DC. >> >> Thanks, >> Joseph >> >> On Mon, Aug 29, 2016 at 11:07 PM, Patrick McFadin <pmcfa...@gmail.com> >> wrote: >> >>> You aren't achieving quorum on your reads as the error is explains. That >>> means you either have some nodes down or your topology is not matching up. >>> The fact you are using LOCAL_QUORUM might point to a datacenter mis-match >>> on node count + RF. >>> >>> What does your nodetool status look like? >>> >>> Patrick >>> >>> On Mon, Aug 29, 2016 at 10:14 AM, Joseph Tech <jaalex.t...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> We recently started getting intermittent timeouts on primary key >>>> queries (select * from table where key=<key>) >>>> >>>> The error is : com.datastax.driver.core.exceptions.ReadTimeoutException: >>>> Cassandra timeout during read query at consistency LOCAL_QUORUM (2 >>>> responses were required but only 1 replica >>>> a responded) >>>> >>>> The same query would work fine when tried directly from cqlsh. There >>>> are no indications in system.log for the table in question, though there >>>> were compactions in progress for tables in another keyspace which is more >>>> frequently accessed. >>>> >>>> My understanding is that the chances of primary key queries timing out >>>> is very minimal. Please share the possible reasons / ways to debug this >>>> issue. >>>> >>>> We are using Cassandra 2.1 (DSE 4.8.7). >>>> >>>> Thanks, >>>> Joseph >>>> >>>> >>>> >>>> >>> >> >