Re: WriteTimeoutException with LOCAL_QUORUM

Romain Hardouin Tue, 06 Sep 2016 03:30:16 -0700

1) Is it a typo or did you really make a giant leap from C* 1.x to 3.4 with all
the C*2.0 and C*2.1 upgrades? (btw if I were you, I would use the last 3.0.X)
2) Regarding NTR all time blocked (e.g. 26070160 from the logs), have a look to
the patch "max_queued_ntr_property.txt":
https://issues.apache.org/jira/browse/CASSANDRA-11363) Then set
-Dcassandra.max_queued_native_transport_requests=XXX to a value that works for
you.
3) Regarding write timeouts: - Are your writes idempotent? You can retry when
a WriteTimeoutException is catched, see IdempotenceAwareRetryPolicy. - We can
see Hints in the logs => Do you monitor the frequency/number of hints? Do you
see some UnavailableException at the driver level? It means that some
nodes are unreachable and even if it should trigger an UnavailableException, it
may also raise WriteTimeoutException if the coordinator of a request doesn't
know yet that the node is unreachable (see failure detector) - 4 GB of heap
is very small and you have 19 tables. Add 40 system tables to this and you have
59 tables that share 4 GB. - You are using batches for one/some table(s),
right? Is it really required? Is is the most used table? - What are the
values of * memtable_cleanup_threshold *
batch_size_warn_threshold_in_kb - What the IO wait status on the nodes? Try
to correlate timeout exceptions with IO wait load. - Commitlog and data are
on separate devices? - What are the value of the following Mbean attributes
on each nodes? *
org.apache.cassandra.metrics:type=CommitLog,name=WaitingOnCommit -
Count *
org.apache.cassandra.metrics:type=CommitLog,name=WaitingOnSegmentAllocation
- Mean - 99thPercentile - Max - Do you see
MemtableFlushWriter blocked tasks on nodes? I see 0 on the logs but the node
may have been restarted (e.g. 18 hours of uptime on the nodetool info).
4) Did you notice that you have tombstones warning? e.g.:
WARN [SharedPool-Worker-48] 2016-09-01 06:53:19,453 ReadCommand.java:481 -
Read 5000 live rows and 10000 tombstone cells for query SELECT * FROM
pc_object_data_beta.vsc_data WHERE rundate, vscid = 1472653906000, 111034565
LIMIT 5000 (see tombstone_warn_threshold)
The chances are high that your data model is not optimal. You should *really*
fix this. Best,
Romain


    Le Mardi 6 septembre 2016 6h47, "adeline....@thomsonreuters.com" 
<adeline....@thomsonreuters.com> a écrit :
 

  <!--#yiv1327406398 _filtered #yiv1327406398 {font-family:Calibri;panose-1:2 
15 5 2 2 2 4 3 2 4;} _filtered #yiv1327406398 {font-family:Tahoma;panose-1:2 11 
6 4 3 5 4 4 2 4;}#yiv1327406398 #yiv1327406398 p.yiv1327406398MsoNormal, 
#yiv1327406398 li.yiv1327406398MsoNormal, #yiv1327406398 
div.yiv1327406398MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:"Calibri", 
"sans-serif";}#yiv1327406398 a:link, #yiv1327406398 
span.yiv1327406398MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv1327406398 a:visited, #yiv1327406398 
span.yiv1327406398MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv1327406398 p 
{margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:"Times New 
Roman", "serif";}#yiv1327406398 p.yiv1327406398MsoListParagraph, #yiv1327406398 
li.yiv1327406398MsoListParagraph, #yiv1327406398 
div.yiv1327406398MsoListParagraph 
{margin-top:0in;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;font-size:11.0pt;font-family:"Calibri",
 "sans-serif";}#yiv1327406398 span.yiv1327406398EmailStyle19 
{font-family:"Calibri", "sans-serif";color:windowtext;}#yiv1327406398 
span.yiv1327406398EmailStyle20 {font-family:"Calibri", 
"sans-serif";color:#1F497D;}#yiv1327406398 span.yiv1327406398EmailStyle21 
{font-family:"Calibri", "sans-serif";color:#1F497D;}#yiv1327406398 
.yiv1327406398MsoChpDefault {font-size:10.0pt;} _filtered #yiv1327406398 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv1327406398 div.yiv1327406398WordSection1 
{}#yiv1327406398 _filtered #yiv1327406398 {} _filtered #yiv1327406398 {} 
_filtered #yiv1327406398 {} _filtered #yiv1327406398 {} _filtered 
#yiv1327406398 {} _filtered #yiv1327406398 {} _filtered #yiv1327406398 {} 
_filtered #yiv1327406398 {} _filtered #yiv1327406398 {} _filtered 
#yiv1327406398 {}#yiv1327406398 ol {margin-bottom:0in;}#yiv1327406398 ul 
{margin-bottom:0in;}-->      From: Pan, Adeline (TR Technology & Ops)
Sent: Tuesday, September 06, 2016 12:34 PM
To: 'user@cassandra.apache.org'
Cc: Yang, Ling (TR Technology & Ops)
Subject: FW: WriteTimeoutException with LOCAL_QUORUM    Hi All, I hope you are 
doing well today, and I need your help.    We were using Cassandra 1 before, 
now we are upgrading  to Cassandra 3.4 . During the integration test, we 
encountered “WriteTimeoutException”  very frequently (about every other 
minute), the exception message is as below.  The exception trace is in the 
attach file.     
| Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: 
Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica 
were required but only 1 acknowledged the write)     |

   There is some information: 1.      It is a six nodes cluster, two data 
centers, and three nodes for each datacenter. The consistency level we are 
using is LOCAL_QUORUM 2.      The node info  
| [BETA:xxxx@xxxx:/local/java/cassandra3/current]$ bin/nodetool -hlocalhost 
info ID                     : ad077318-6531-498e-bf5a-14ac339d1a45 Gossip 
active          : true Thrift active          : false Native Transport active: 
true Load                   : 23.47 GB Generation No          : 1473065408 
Uptime (seconds)       : 67180 Heap Memory (MB)       : 1679.57 / 4016.00 Off 
Heap Memory (MB)   : 10.34 Data Center            : dc1 Rack                   
: rack1 Exceptions             : 0 Key Cache              : entries 32940, size 
3.8 MB, capacity 100 MB, 2124114 hits, 2252348 requests, 0.943 recent hit rate, 
14400 save period in seconds Row Cache              : entries 0, size 0 bytes, 
capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in 
seconds Counter Cache          : entries 0, size 0 bytes, capacity 50 MB, 0 
hits, 0 requests, NaN recent hit rate, 7200 save period in seconds Token        
          : (invoke with -T/--tokens to see all 256 tokens)  |

3.      We have increased the write_request_timeout_in_ms to 40000,  which 
didn’t work. 4.      The memtable size is 4GB. 5.      
memtable_allocation_type: heap_buffers 6.      In the Cassandra server log, we 
found there are Native-Transport-Requests  pending from time to time. (The 
server log piece is in attach file) 
| INFO  [ScheduledTasks:1] 2016-09-01 10:08:47,036 StatusLogger.java:52 - Pool 
Name                              Active   Pending      Completed   Blocked  
All Time Blocked INFO  [ScheduledTasks:1] 2016-09-01 10:08:47,043 
StatusLogger.java:56 - Native-Transport-Requests       128       134      
300721823         6          28211424  |

      What also need to mention is there is always 1 node will acknowledge the 
write .Could you please help me with this situation? Any clue will be 
appreciated. Thank you very much! 
-------------------------------------------------------------------------------------------
 Adeline Pan Senior Software Engineer    Thomson Reuters Phone: 62674654    
adeline....@thomsonreuters.com

Re: WriteTimeoutException with LOCAL_QUORUM

Reply via email to