1) Is it a typo or did you really make a giant leap from C* 1.x to 3.4 with all 
the C*2.0 and C*2.1 upgrades? (btw if I were you, I would use the last 3.0.X)
2) Regarding NTR all time blocked (e.g. 26070160 from the logs), have a look to 
the patch "max_queued_ntr_property.txt": 
https://issues.apache.org/jira/browse/CASSANDRA-11363)   Then set 
-Dcassandra.max_queued_native_transport_requests=XXX to a value that works for 
you.
3) Regarding write timeouts:   - Are your writes idempotent? You can retry when 
a WriteTimeoutException is catched, see IdempotenceAwareRetryPolicy.   - We can 
see Hints in the logs => Do you monitor the frequency/number of hints? Do you 
see some UnavailableException at the driver level?        It means that some 
nodes are unreachable and even if it should trigger an UnavailableException, it 
may also raise WriteTimeoutException if the coordinator of a request doesn't 
know yet that the node is unreachable (see failure detector)    - 4 GB of heap 
is very small and you have 19 tables. Add 40 system tables to this and you have 
59 tables that share 4 GB.   - You are using batches for one/some table(s), 
right? Is it really required? Is is the most used table?   - What are the 
values of         * memtable_cleanup_threshold        * 
batch_size_warn_threshold_in_kb   - What the IO wait status on the nodes? Try 
to correlate timeout exceptions with IO wait load.   - Commitlog and data are 
on separate devices?   - What are the value of the following Mbean attributes 
on each nodes?        * 
org.apache.cassandra.metrics:type=CommitLog,name=WaitingOnCommit            - 
Count        * 
org.apache.cassandra.metrics:type=CommitLog,name=WaitingOnSegmentAllocation     
       - Mean            - 99thPercentile            - Max   - Do you see 
MemtableFlushWriter blocked tasks on nodes? I see 0 on the logs but the node 
may have been restarted (e.g. 18 hours of uptime on the nodetool info).       
4) Did you notice that you have tombstones warning? e.g.:
    WARN  [SharedPool-Worker-48] 2016-09-01 06:53:19,453 ReadCommand.java:481 - 
Read 5000 live rows and 10000 tombstone cells for query SELECT * FROM 
pc_object_data_beta.vsc_data WHERE rundate, vscid = 1472653906000, 111034565 
LIMIT 5000 (see tombstone_warn_threshold)
The chances are high that your data model is not optimal. You should *really* 
fix this.       Best,
Romain 

    Le Mardi 6 septembre 2016 6h47, "adeline....@thomsonreuters.com" 
<adeline....@thomsonreuters.com> a écrit :
 

  <!--#yiv1327406398 _filtered #yiv1327406398 {font-family:Calibri;panose-1:2 
15 5 2 2 2 4 3 2 4;} _filtered #yiv1327406398 {font-family:Tahoma;panose-1:2 11 
6 4 3 5 4 4 2 4;}#yiv1327406398 #yiv1327406398 p.yiv1327406398MsoNormal, 
#yiv1327406398 li.yiv1327406398MsoNormal, #yiv1327406398 
div.yiv1327406398MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:"Calibri", 
"sans-serif";}#yiv1327406398 a:link, #yiv1327406398 
span.yiv1327406398MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv1327406398 a:visited, #yiv1327406398 
span.yiv1327406398MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv1327406398 p 
{margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:"Times New 
Roman", "serif";}#yiv1327406398 p.yiv1327406398MsoListParagraph, #yiv1327406398 
li.yiv1327406398MsoListParagraph, #yiv1327406398 
div.yiv1327406398MsoListParagraph 
{margin-top:0in;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;font-size:11.0pt;font-family:"Calibri",
 "sans-serif";}#yiv1327406398 span.yiv1327406398EmailStyle19 
{font-family:"Calibri", "sans-serif";color:windowtext;}#yiv1327406398 
span.yiv1327406398EmailStyle20 {font-family:"Calibri", 
"sans-serif";color:#1F497D;}#yiv1327406398 span.yiv1327406398EmailStyle21 
{font-family:"Calibri", "sans-serif";color:#1F497D;}#yiv1327406398 
.yiv1327406398MsoChpDefault {font-size:10.0pt;} _filtered #yiv1327406398 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv1327406398 div.yiv1327406398WordSection1 
{}#yiv1327406398 _filtered #yiv1327406398 {} _filtered #yiv1327406398 {} 
_filtered #yiv1327406398 {} _filtered #yiv1327406398 {} _filtered 
#yiv1327406398 {} _filtered #yiv1327406398 {} _filtered #yiv1327406398 {} 
_filtered #yiv1327406398 {} _filtered #yiv1327406398 {} _filtered 
#yiv1327406398 {}#yiv1327406398 ol {margin-bottom:0in;}#yiv1327406398 ul 
{margin-bottom:0in;}-->      From: Pan, Adeline (TR Technology & Ops)
Sent: Tuesday, September 06, 2016 12:34 PM
To: 'user@cassandra.apache.org'
Cc: Yang, Ling (TR Technology & Ops)
Subject: FW: WriteTimeoutException with LOCAL_QUORUM    Hi All, I hope you are 
doing well today, and I need your help.    We were using Cassandra 1 before, 
now we are upgrading  to Cassandra 3.4 . During the integration test, we 
encountered “WriteTimeoutException”  very frequently (about every other 
minute), the exception message is as below.  The exception trace is in the 
attach file.     
| Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: 
Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica 
were required but only 1 acknowledged the write)     |

   There is some information: 1.      It is a six nodes cluster, two data 
centers, and three nodes for each datacenter. The consistency level we are 
using is LOCAL_QUORUM 2.      The node info  
| [BETA:xxxx@xxxx:/local/java/cassandra3/current]$ bin/nodetool -hlocalhost 
info ID                     : ad077318-6531-498e-bf5a-14ac339d1a45 Gossip 
active          : true Thrift active          : false Native Transport active: 
true Load                   : 23.47 GB Generation No          : 1473065408 
Uptime (seconds)       : 67180 Heap Memory (MB)       : 1679.57 / 4016.00 Off 
Heap Memory (MB)   : 10.34 Data Center            : dc1 Rack                   
: rack1 Exceptions             : 0 Key Cache              : entries 32940, size 
3.8 MB, capacity 100 MB, 2124114 hits, 2252348 requests, 0.943 recent hit rate, 
14400 save period in seconds Row Cache              : entries 0, size 0 bytes, 
capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in 
seconds Counter Cache          : entries 0, size 0 bytes, capacity 50 MB, 0 
hits, 0 requests, NaN recent hit rate, 7200 save period in seconds Token        
          : (invoke with -T/--tokens to see all 256 tokens)  |

3.      We have increased the write_request_timeout_in_ms to 40000,  which 
didn’t work. 4.      The memtable size is 4GB. 5.      
memtable_allocation_type: heap_buffers 6.      In the Cassandra server log, we 
found there are Native-Transport-Requests  pending from time to time. (The 
server log piece is in attach file) 
| INFO  [ScheduledTasks:1] 2016-09-01 10:08:47,036 StatusLogger.java:52 - Pool 
Name                              Active   Pending      Completed   Blocked  
All Time Blocked INFO  [ScheduledTasks:1] 2016-09-01 10:08:47,043 
StatusLogger.java:56 - Native-Transport-Requests       128       134      
300721823         6          28211424  |

      What also need to mention is there is always 1 node will acknowledge the 
write .Could you please help me with this situation? Any clue will be 
appreciated. Thank you very much! 
-------------------------------------------------------------------------------------------
 Adeline Pan Senior Software Engineer    Thomson Reuters Phone: 62674654    
adeline....@thomsonreuters.com    

   

Reply via email to