Replication delays due to issues with inter node communication across multiple data centers, hints are piling up

2017-10-22 Thread vbhang...@gmail.com
This is for Cassandra 2.1.13. At times there are replication delays across 
multiple regions. Data is available (getting queried from command line) in 1 
region but not seen in other region(s).  This is not consistent. It is cluster 
spanning multiple data centers with total > 30 nodes. Keyspace is configured to 
get replicated in all the data centers.

Hints are getting piled up in the source region. This happens especially for 
large data payload (appro 1kb to few MB blobs).  Network  level congestion or 
saturation does not seem to be an issue.  There is no memory/cpu pressure on 
individual nodes.

I am sharing Cassandra.yaml below, any pointers on what can be tuned are highly 
appreciated. Let me know if you need any other info.

We tried bumping up hinted_handoff_throttle_in_kb: 30720 and handoff tends to 
be slower max_hints_delivery_threads: 12 on one of the nodes to see if it 
speeds up hints delivery, there was some improvement but not whole lot.

Thanks

=
# Cassandra storage config YAML

# NOTE:
#   See http://wiki.apache.org/cassandra/StorageConfiguration for
#   full explanations of configuration directives
# /NOTE

# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: "central"

# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
# that this node will store. You probably want all nodes to have the same number
# of tokens assuming they have equal hardware capability.
#
# If you leave this unspecified, Cassandra will use the default of 1 token for 
legacy compatibility,
# and will use the initial_token as described below.
#
# Specifying initial_token will override this setting on the node's initial 
start,
# on subsequent starts, this setting will apply even if initial token is set.
#
# If you already have a cluster with 1 token per node, and wish to migrate to
# multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
#num_tokens: 256

# initial_token allows you to specify tokens manually.  While you can use # it 
with
# vnodes (num_tokens > 1, above) -- in which case you should provide a
# comma-separated list -- it's primarily used when adding nodes # to legacy 
clusters
# that do not have vnodes enabled.
# initial_token:

initial_token: 

# See http://wiki.apache.org/cassandra/HintedHandoff
# May either be "true" or "false" to enable globally, or contain a list
# of data centers to enable per-datacenter.
# hinted_handoff_enabled: DC1,DC2
hinted_handoff_enabled: true
# this defines the maximum amount of time a dead host will have hints
# generated.  After it has been dead this long, new hints for it will not be
# created until it has been seen alive and gone down again.
max_hint_window_in_ms: 1080 # 3 hours
# Maximum throttle in KBs per second, per delivery thread.  This will be
# reduced proportionally to the number of nodes in the cluster.  (If there
# are two nodes in the cluster, each delivery thread will use the maximum
# rate; if there are three, each will throttle to half of the maximum,
# since we expect two nodes to be delivering hints simultaneously.)
hinted_handoff_throttle_in_kb: 1024
# Number of threads with which to deliver hints;
# Consider increasing this number when you have multi-dc deployments, since
# cross-dc handoff tends to be slower
max_hints_delivery_threads: 6

# Maximum throttle in KBs per second, total. This will be
# reduced proportionally to the number of nodes in the cluster.
batchlog_replay_throttle_in_kb: 1024

# Authentication backend, implementing IAuthenticator; used to identify users
# Out of the box, Cassandra provides 
org.apache.cassandra.auth.{AllowAllAuthenticator,
# PasswordAuthenticator}.
#
# - AllowAllAuthenticator performs no checks - set it to disable authentication.
# - PasswordAuthenticator relies on username/password pairs to authenticate
#   users. It keeps usernames and hashed passwords in system_auth.credentials 
table.
#   Please increase system_auth keyspace replication factor if you use this 
authenticator.
authenticator: AllowAllAuthenticator

# Authorization backend, implementing IAuthorizer; used to limit access/provide 
permissions
# Out of the box, Cassandra provides 
org.apache.cassandra.auth.{AllowAllAuthorizer,
# CassandraAuthorizer}.
#
# - AllowAllAuthorizer allows any action to any user - set it to disable 
authorization.
# - CassandraAuthorizer stores permissions in system_auth.permissions table. 
Please
#   increase system_auth keyspace replication factor if you use this authorizer.
authorizer: AllowAllAuthorizer

# Validity period for permissions cache (fetching permissions can be an
# expensive operation depending on the authorizer, CassandraAuthorizer is
# one example). Defaults to 2000, set to 0 to disable.
# Will be disabled automatically for AllowAllAuthorizer.
permissions_validity_in_m

[no subject]

2017-10-22 Thread vbhang...@gmail.com
-- Consistency level  LQ
-- It started happening approximately couple of months back.  Issue is very 
inconsistent and can't be reproduced.  It used rarely happen earlier (since 
last few years).
-- There are very few GC pauses but  they don't coincide with the issue. 
-- 99% latency is less than 80ms and 75% is less than 5ms.

- Vedant
On 2017-10-22 21:29, Jeff Jirsa  wrote: 
> What consistency level do you use on writes?
> Did this just start or has it always happened ?
> Are you seeing GC pauses at all?
> 
> What’s your 99% write latency? 
> 
> -- 
> Jeff Jirsa
> 
> 
> > On Oct 22, 2017, at 9:21 PM, "vbhang...@gmail.com" 
> > wrote:
> > 
> > This is for Cassandra 2.1.13. At times there are replication delays across 
> > multiple regions. Data is available (getting queried from command line) in 
> > 1 region but not seen in other region(s).  This is not consistent. It is 
> > cluster spanning multiple data centers with total > 30 nodes. Keyspace is 
> > configured to get replicated in all the data centers.
> > 
> > Hints are getting piled up in the source region. This happens especially 
> > for large data payload (appro 1kb to few MB blobs).  Network  level 
> > congestion or saturation does not seem to be an issue.  There is no 
> > memory/cpu pressure on individual nodes.
> > 
> > I am sharing Cassandra.yaml below, any pointers on what can be tuned are 
> > highly appreciated. Let me know if you need any other info.
> > 
> > We tried bumping up hinted_handoff_throttle_in_kb: 30720 and handoff tends 
> > to be slower max_hints_delivery_threads: 12 on one of the nodes to see if 
> > it speeds up hints delivery, there was some improvement but not whole lot.
> > 
> > Thanks
> > 
> > =
> > # Cassandra storage config YAML
> > 
> > # NOTE:
> > #   See http://wiki.apache.org/cassandra/StorageConfiguration for
> > #   full explanations of configuration directives
> > # /NOTE
> > 
> > # The name of the cluster. This is mainly used to prevent machines in
> > # one logical cluster from joining another.
> > cluster_name: "central"
> > 
> > # This defines the number of tokens randomly assigned to this node on the 
> > ring
> > # The more tokens, relative to other nodes, the larger the proportion of 
> > data
> > # that this node will store. You probably want all nodes to have the same 
> > number
> > # of tokens assuming they have equal hardware capability.
> > #
> > # If you leave this unspecified, Cassandra will use the default of 1 token 
> > for legacy compatibility,
> > # and will use the initial_token as described below.
> > #
> > # Specifying initial_token will override this setting on the node's initial 
> > start,
> > # on subsequent starts, this setting will apply even if initial token is 
> > set.
> > #
> > # If you already have a cluster with 1 token per node, and wish to migrate 
> > to
> > # multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
> > #num_tokens: 256
> > 
> > # initial_token allows you to specify tokens manually.  While you can use # 
> > it with
> > # vnodes (num_tokens > 1, above) -- in which case you should provide a
> > # comma-separated list -- it's primarily used when adding nodes # to legacy 
> > clusters
> > # that do not have vnodes enabled.
> > # initial_token:
> > 
> > initial_token: 
> > 
> > # See http://wiki.apache.org/cassandra/HintedHandoff
> > # May either be "true" or "false" to enable globally, or contain a list
> > # of data centers to enable per-datacenter.
> > # hinted_handoff_enabled: DC1,DC2
> > hinted_handoff_enabled: true
> > # this defines the maximum amount of time a dead host will have hints
> > # generated.  After it has been dead this long, new hints for it will not be
> > # created until it has been seen alive and gone down again.
> > max_hint_window_in_ms: 1080 # 3 hours
> > # Maximum throttle in KBs per second, per delivery thread.  This will be
> > # reduced proportionally to the number of nodes in the cluster.  (If there
> > # are two nodes in the cluster, each delivery thread will use the maximum
> > # rate; if there are three, each will throttle to half of the maximum,
> > # since we expect two nodes to be delivering hints simultaneously.)
> > hinted_handoff_throttle_in_kb: 1024
> > # Number of threads with which to deliver hints;
> > # Consider increasing this number when y

[no subject]

2017-10-23 Thread vbhang...@gmail.com
It is RF=3 and 12 nodes in 3 regions and 6 in other 2, so total 48 nodes. Are 
you suggesting forced read repair by reading consistency of ONE or by bumping 
up read_repair_chance? 

We have tried from command  line with ONE but that times out. 
On 2017-10-23 10:18, "Mohapatra, Kishore"  wrote: 
> What is your RF for the keyspace and how many nodes are there in each DC ?
> 
> Did you force a Read Repair to see, if you are getting the data or getting an 
> error ?
> 
> Thanks
> 
> Kishore Mohapatra
> Principal Operations DBA
> Seattle, WA
> Email : kishore.mohapa...@nuance.com
> 
> 
> -----Original Message-
> From: vbhang...@gmail.com [mailto:vbhang...@gmail.com] 
> Sent: Sunday, October 22, 2017 11:31 PM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] 
> 
> -- Consistency level  LQ
> -- It started happening approximately couple of months back.  Issue is very 
> inconsistent and can't be reproduced.  It used rarely happen earlier (since 
> last few years).
> -- There are very few GC pauses but  they don't coincide with the issue. 
> -- 99% latency is less than 80ms and 75% is less than 5ms.
> 
> - Vedant
> On 2017-10-22 21:29, Jeff Jirsa  wrote: 
> > What consistency level do you use on writes?
> > Did this just start or has it always happened ?
> > Are you seeing GC pauses at all?
> > 
> > What’s your 99% write latency? 
> > 
> > --
> > Jeff Jirsa
> > 
> > 
> > > On Oct 22, 2017, at 9:21 PM, "vbhang...@gmail.com" 
> > > wrote:
> > > 
> > > This is for Cassandra 2.1.13. At times there are replication delays 
> > > across multiple regions. Data is available (getting queried from command 
> > > line) in 1 region but not seen in other region(s).  This is not 
> > > consistent. It is cluster spanning multiple data centers with total > 30 
> > > nodes. Keyspace is configured to get replicated in all the data centers.
> > > 
> > > Hints are getting piled up in the source region. This happens especially 
> > > for large data payload (appro 1kb to few MB blobs).  Network  level 
> > > congestion or saturation does not seem to be an issue.  There is no 
> > > memory/cpu pressure on individual nodes.
> > > 
> > > I am sharing Cassandra.yaml below, any pointers on what can be tuned are 
> > > highly appreciated. Let me know if you need any other info.
> > > 
> > > We tried bumping up hinted_handoff_throttle_in_kb: 30720 and handoff 
> > > tends to be slower max_hints_delivery_threads: 12 on one of the nodes to 
> > > see if it speeds up hints delivery, there was some improvement but not 
> > > whole lot.
> > > 
> > > Thanks
> > > 
> > > =
> > > # Cassandra storage config YAML
> > > 
> > > # NOTE:
> > > #   See 
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.apache.org_cassandra_StorageConfiguration&d=DwIBaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=O20_rcIS1QazTO3_J10I1cPIygxnuBZ4sUCz1TS16XE&m=n1yhBCTDUhib4RoMH1SWmzcJU1bb-kL6WyTdhDlBL5g&s=1SQ9gAKWYTFTLEnR1ubZ0zPq_wtBEpY9udxtmNRr6Qg&e=
> > >   for
> > > #   full explanations of configuration directives
> > > # /NOTE
> > > 
> > > # The name of the cluster. This is mainly used to prevent machines 
> > > in # one logical cluster from joining another.
> > > cluster_name: "central"
> > > 
> > > # This defines the number of tokens randomly assigned to this node 
> > > on the ring # The more tokens, relative to other nodes, the larger 
> > > the proportion of data # that this node will store. You probably 
> > > want all nodes to have the same number # of tokens assuming they have 
> > > equal hardware capability.
> > > #
> > > # If you leave this unspecified, Cassandra will use the default of 1 
> > > token for legacy compatibility, # and will use the initial_token as 
> > > described below.
> > > #
> > > # Specifying initial_token will override this setting on the node's 
> > > initial start, # on subsequent starts, this setting will apply even if 
> > > initial token is set.
> > > #
> > > # If you already have a cluster with 1 token per node, and wish to 
> > > migrate to # multiple tokens per node, see 
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.apache.org_
> > > cassandra_Operations&d=DwIBaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0
> > > rr

[no subject]

2017-11-04 Thread vbhang...@gmail.com
Kishore, Here is the table dean and cfstats o/p  ---
===
CREATE TABLE ks1.table1 (
key text,
column1 
'org.apache.cassandra.db.marshal.DynamicCompositeType(org.apache.cassandra.db.marshal.UTF8Type)',
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC)
AND bloom_filter_fp_chance = 0.1
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'sstable_size_in_mb': '256', 'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.1
AND speculative_retry = '99.0PERCENTILE';
==
SSTable count: 261
SSTables in each level: [0, 6, 40, 215, 0, 0, 0, 0, 0]
Space used (live): 129255873809
Space used (total): 129255873809
Space used by snapshots (total): 0
Off heap memory used (total): 20977830
SSTable Compression Ratio: 0.7879224917729545
Number of keys (estimate): 71810
Memtable cell count: 2010
Memtable data size: 226253
Memtable off heap memory used: 1327192
Memtable switch count: 47
Local read count: 11688546
Local read latency: 0.195 ms
Local write count: 225262
Local write latency: 0.055 ms
Pending flushes: 0
Bloom filter false positives: 146072
Bloom filter false ratio: 0.01543
Bloom filter space used: 35592
Bloom filter off heap memory used: 33504
Index summary off heap memory used: 26686
Compression metadata off heap memory used: 19590448
Compacted partition minimum bytes: 25
Compacted partition maximum bytes: 10299432635
Compacted partition mean bytes: 2334776
Average live cells per slice (last five minutes): 
4.346574725773759
Maximum live cells per slice (last five minutes): 2553.0
Average tombstones per slice (last five minutes): 
0.3096773382165276
Maximum tombstones per slice (last five minutes): 804.0
=

On 2017-10-24 14:39, "Mohapatra, Kishore"  wrote: 
> Hi Vedant,
>   I was actually referring to command line select query 
> with Consistency level=ALL . This will force a read repair in the background.
> But as I can see, you have tried with consistency level = one and and it is 
> still timing out. SO what error you see in the system.log ?
> Streaming error ?
> 
> Can you also check how many sstables are there for that table . Seems like 
> your compaction may not be working.
> Is your repair job running fine ?
> 
> Thanks
> 
> Kishore Mohapatra
> Principal Operations DBA
> Seattle, WA
> Ph : 425-691-6417 (cell)
> Email : kishore.mohapa...@nuance.com
> 
> 
> -Original Message-
> From: vbhang...@gmail.com [mailto:vbhang...@gmail.com] 
> Sent: Monday, October 23, 2017 6:59 PM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] 
> 
> It is RF=3 and 12 nodes in 3 regions and 6 in other 2, so total 48 nodes. Are 
> you suggesting forced read repair by reading consistency of ONE or by bumping 
> up read_repair_chance? 
> 
> We have tried from command  line with ONE but that times out. 
> On 2017-10-23 10:18, "Mohapatra, Kishore"  
> wrote: 
> > What is your RF for the keyspace and how many nodes are there in each DC ?
> > 
> > Did you force a Read Repair to see, if you are getting the data or getting 
> > an error ?
> > 
> > Thanks
> > 
> > Kishore Mohapatra
> > Principal Operations DBA
> > Seattle, WA
> > Email : kishore.mohapa...@nuance.com
> > 
> > 
> > -Original Message-
> > From: vbhang...@gmail.com [mailto:vbhang...@gmail.com]
> > Sent: Sunday, October 22, 2017 11:31 PM
> > To: user@cassandra.apache.org
> > Subject: [EXTERNAL]
> > 
> > -- Consistency level  LQ
> > -- It started happening approximately couple of months back.  Issue is very 
> > inconsistent and can't be reproduced.  It used rarely happen earlier (since 
> > las