Replication delays due to issues with inter node communication across multiple data centers, hints are piling up
This is for Cassandra 2.1.13. At times there are replication delays across multiple regions. Data is available (getting queried from command line) in 1 region but not seen in other region(s). This is not consistent. It is cluster spanning multiple data centers with total > 30 nodes. Keyspace is configured to get replicated in all the data centers. Hints are getting piled up in the source region. This happens especially for large data payload (appro 1kb to few MB blobs). Network level congestion or saturation does not seem to be an issue. There is no memory/cpu pressure on individual nodes. I am sharing Cassandra.yaml below, any pointers on what can be tuned are highly appreciated. Let me know if you need any other info. We tried bumping up hinted_handoff_throttle_in_kb: 30720 and handoff tends to be slower max_hints_delivery_threads: 12 on one of the nodes to see if it speeds up hints delivery, there was some improvement but not whole lot. Thanks = # Cassandra storage config YAML # NOTE: # See http://wiki.apache.org/cassandra/StorageConfiguration for # full explanations of configuration directives # /NOTE # The name of the cluster. This is mainly used to prevent machines in # one logical cluster from joining another. cluster_name: "central" # This defines the number of tokens randomly assigned to this node on the ring # The more tokens, relative to other nodes, the larger the proportion of data # that this node will store. You probably want all nodes to have the same number # of tokens assuming they have equal hardware capability. # # If you leave this unspecified, Cassandra will use the default of 1 token for legacy compatibility, # and will use the initial_token as described below. # # Specifying initial_token will override this setting on the node's initial start, # on subsequent starts, this setting will apply even if initial token is set. # # If you already have a cluster with 1 token per node, and wish to migrate to # multiple tokens per node, see http://wiki.apache.org/cassandra/Operations #num_tokens: 256 # initial_token allows you to specify tokens manually. While you can use # it with # vnodes (num_tokens > 1, above) -- in which case you should provide a # comma-separated list -- it's primarily used when adding nodes # to legacy clusters # that do not have vnodes enabled. # initial_token: initial_token: # See http://wiki.apache.org/cassandra/HintedHandoff # May either be "true" or "false" to enable globally, or contain a list # of data centers to enable per-datacenter. # hinted_handoff_enabled: DC1,DC2 hinted_handoff_enabled: true # this defines the maximum amount of time a dead host will have hints # generated. After it has been dead this long, new hints for it will not be # created until it has been seen alive and gone down again. max_hint_window_in_ms: 1080 # 3 hours # Maximum throttle in KBs per second, per delivery thread. This will be # reduced proportionally to the number of nodes in the cluster. (If there # are two nodes in the cluster, each delivery thread will use the maximum # rate; if there are three, each will throttle to half of the maximum, # since we expect two nodes to be delivering hints simultaneously.) hinted_handoff_throttle_in_kb: 1024 # Number of threads with which to deliver hints; # Consider increasing this number when you have multi-dc deployments, since # cross-dc handoff tends to be slower max_hints_delivery_threads: 6 # Maximum throttle in KBs per second, total. This will be # reduced proportionally to the number of nodes in the cluster. batchlog_replay_throttle_in_kb: 1024 # Authentication backend, implementing IAuthenticator; used to identify users # Out of the box, Cassandra provides org.apache.cassandra.auth.{AllowAllAuthenticator, # PasswordAuthenticator}. # # - AllowAllAuthenticator performs no checks - set it to disable authentication. # - PasswordAuthenticator relies on username/password pairs to authenticate # users. It keeps usernames and hashed passwords in system_auth.credentials table. # Please increase system_auth keyspace replication factor if you use this authenticator. authenticator: AllowAllAuthenticator # Authorization backend, implementing IAuthorizer; used to limit access/provide permissions # Out of the box, Cassandra provides org.apache.cassandra.auth.{AllowAllAuthorizer, # CassandraAuthorizer}. # # - AllowAllAuthorizer allows any action to any user - set it to disable authorization. # - CassandraAuthorizer stores permissions in system_auth.permissions table. Please # increase system_auth keyspace replication factor if you use this authorizer. authorizer: AllowAllAuthorizer # Validity period for permissions cache (fetching permissions can be an # expensive operation depending on the authorizer, CassandraAuthorizer is # one example). Defaults to 2000, set to 0 to disable. # Will be disabled automatically for AllowAllAuthorizer. permissions_validity_in_m
[no subject]
-- Consistency level LQ -- It started happening approximately couple of months back. Issue is very inconsistent and can't be reproduced. It used rarely happen earlier (since last few years). -- There are very few GC pauses but they don't coincide with the issue. -- 99% latency is less than 80ms and 75% is less than 5ms. - Vedant On 2017-10-22 21:29, Jeff Jirsa wrote: > What consistency level do you use on writes? > Did this just start or has it always happened ? > Are you seeing GC pauses at all? > > Whatâs your 99% write latency? > > -- > Jeff Jirsa > > > > On Oct 22, 2017, at 9:21 PM, "vbhang...@gmail.com" > > wrote: > > > > This is for Cassandra 2.1.13. At times there are replication delays across > > multiple regions. Data is available (getting queried from command line) in > > 1 region but not seen in other region(s). This is not consistent. It is > > cluster spanning multiple data centers with total > 30 nodes. Keyspace is > > configured to get replicated in all the data centers. > > > > Hints are getting piled up in the source region. This happens especially > > for large data payload (appro 1kb to few MB blobs). Network level > > congestion or saturation does not seem to be an issue. There is no > > memory/cpu pressure on individual nodes. > > > > I am sharing Cassandra.yaml below, any pointers on what can be tuned are > > highly appreciated. Let me know if you need any other info. > > > > We tried bumping up hinted_handoff_throttle_in_kb: 30720 and handoff tends > > to be slower max_hints_delivery_threads: 12 on one of the nodes to see if > > it speeds up hints delivery, there was some improvement but not whole lot. > > > > Thanks > > > > = > > # Cassandra storage config YAML > > > > # NOTE: > > # See http://wiki.apache.org/cassandra/StorageConfiguration for > > # full explanations of configuration directives > > # /NOTE > > > > # The name of the cluster. This is mainly used to prevent machines in > > # one logical cluster from joining another. > > cluster_name: "central" > > > > # This defines the number of tokens randomly assigned to this node on the > > ring > > # The more tokens, relative to other nodes, the larger the proportion of > > data > > # that this node will store. You probably want all nodes to have the same > > number > > # of tokens assuming they have equal hardware capability. > > # > > # If you leave this unspecified, Cassandra will use the default of 1 token > > for legacy compatibility, > > # and will use the initial_token as described below. > > # > > # Specifying initial_token will override this setting on the node's initial > > start, > > # on subsequent starts, this setting will apply even if initial token is > > set. > > # > > # If you already have a cluster with 1 token per node, and wish to migrate > > to > > # multiple tokens per node, see http://wiki.apache.org/cassandra/Operations > > #num_tokens: 256 > > > > # initial_token allows you to specify tokens manually. While you can use # > > it with > > # vnodes (num_tokens > 1, above) -- in which case you should provide a > > # comma-separated list -- it's primarily used when adding nodes # to legacy > > clusters > > # that do not have vnodes enabled. > > # initial_token: > > > > initial_token: > > > > # See http://wiki.apache.org/cassandra/HintedHandoff > > # May either be "true" or "false" to enable globally, or contain a list > > # of data centers to enable per-datacenter. > > # hinted_handoff_enabled: DC1,DC2 > > hinted_handoff_enabled: true > > # this defines the maximum amount of time a dead host will have hints > > # generated. After it has been dead this long, new hints for it will not be > > # created until it has been seen alive and gone down again. > > max_hint_window_in_ms: 1080 # 3 hours > > # Maximum throttle in KBs per second, per delivery thread. This will be > > # reduced proportionally to the number of nodes in the cluster. (If there > > # are two nodes in the cluster, each delivery thread will use the maximum > > # rate; if there are three, each will throttle to half of the maximum, > > # since we expect two nodes to be delivering hints simultaneously.) > > hinted_handoff_throttle_in_kb: 1024 > > # Number of threads with which to deliver hints; > > # Consider increasing this number when y
[no subject]
It is RF=3 and 12 nodes in 3 regions and 6 in other 2, so total 48 nodes. Are you suggesting forced read repair by reading consistency of ONE or by bumping up read_repair_chance? We have tried from command line with ONE but that times out. On 2017-10-23 10:18, "Mohapatra, Kishore" wrote: > What is your RF for the keyspace and how many nodes are there in each DC ? > > Did you force a Read Repair to see, if you are getting the data or getting an > error ? > > Thanks > > Kishore Mohapatra > Principal Operations DBA > Seattle, WA > Email : kishore.mohapa...@nuance.com > > > -----Original Message- > From: vbhang...@gmail.com [mailto:vbhang...@gmail.com] > Sent: Sunday, October 22, 2017 11:31 PM > To: user@cassandra.apache.org > Subject: [EXTERNAL] > > -- Consistency level LQ > -- It started happening approximately couple of months back. Issue is very > inconsistent and can't be reproduced. It used rarely happen earlier (since > last few years). > -- There are very few GC pauses but they don't coincide with the issue. > -- 99% latency is less than 80ms and 75% is less than 5ms. > > - Vedant > On 2017-10-22 21:29, Jeff Jirsa wrote: > > What consistency level do you use on writes? > > Did this just start or has it always happened ? > > Are you seeing GC pauses at all? > > > > Whatââ¬â¢s your 99% write latency? > > > > -- > > Jeff Jirsa > > > > > > > On Oct 22, 2017, at 9:21 PM, "vbhang...@gmail.com" > > > wrote: > > > > > > This is for Cassandra 2.1.13. At times there are replication delays > > > across multiple regions. Data is available (getting queried from command > > > line) in 1 region but not seen in other region(s). This is not > > > consistent. It is cluster spanning multiple data centers with total > 30 > > > nodes. Keyspace is configured to get replicated in all the data centers. > > > > > > Hints are getting piled up in the source region. This happens especially > > > for large data payload (appro 1kb to few MB blobs). Network level > > > congestion or saturation does not seem to be an issue. There is no > > > memory/cpu pressure on individual nodes. > > > > > > I am sharing Cassandra.yaml below, any pointers on what can be tuned are > > > highly appreciated. Let me know if you need any other info. > > > > > > We tried bumping up hinted_handoff_throttle_in_kb: 30720 and handoff > > > tends to be slower max_hints_delivery_threads: 12 on one of the nodes to > > > see if it speeds up hints delivery, there was some improvement but not > > > whole lot. > > > > > > Thanks > > > > > > = > > > # Cassandra storage config YAML > > > > > > # NOTE: > > > # See > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.apache.org_cassandra_StorageConfiguration&d=DwIBaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=O20_rcIS1QazTO3_J10I1cPIygxnuBZ4sUCz1TS16XE&m=n1yhBCTDUhib4RoMH1SWmzcJU1bb-kL6WyTdhDlBL5g&s=1SQ9gAKWYTFTLEnR1ubZ0zPq_wtBEpY9udxtmNRr6Qg&e= > > > for > > > # full explanations of configuration directives > > > # /NOTE > > > > > > # The name of the cluster. This is mainly used to prevent machines > > > in # one logical cluster from joining another. > > > cluster_name: "central" > > > > > > # This defines the number of tokens randomly assigned to this node > > > on the ring # The more tokens, relative to other nodes, the larger > > > the proportion of data # that this node will store. You probably > > > want all nodes to have the same number # of tokens assuming they have > > > equal hardware capability. > > > # > > > # If you leave this unspecified, Cassandra will use the default of 1 > > > token for legacy compatibility, # and will use the initial_token as > > > described below. > > > # > > > # Specifying initial_token will override this setting on the node's > > > initial start, # on subsequent starts, this setting will apply even if > > > initial token is set. > > > # > > > # If you already have a cluster with 1 token per node, and wish to > > > migrate to # multiple tokens per node, see > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.apache.org_ > > > cassandra_Operations&d=DwIBaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0 > > > rr
[no subject]
Kishore, Here is the table dean and cfstats o/p --- === CREATE TABLE ks1.table1 ( key text, column1 'org.apache.cassandra.db.marshal.DynamicCompositeType(org.apache.cassandra.db.marshal.UTF8Type)', value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (column1 ASC) AND bloom_filter_fp_chance = 0.1 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'sstable_size_in_mb': '256', 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 86400 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.1 AND speculative_retry = '99.0PERCENTILE'; == SSTable count: 261 SSTables in each level: [0, 6, 40, 215, 0, 0, 0, 0, 0] Space used (live): 129255873809 Space used (total): 129255873809 Space used by snapshots (total): 0 Off heap memory used (total): 20977830 SSTable Compression Ratio: 0.7879224917729545 Number of keys (estimate): 71810 Memtable cell count: 2010 Memtable data size: 226253 Memtable off heap memory used: 1327192 Memtable switch count: 47 Local read count: 11688546 Local read latency: 0.195 ms Local write count: 225262 Local write latency: 0.055 ms Pending flushes: 0 Bloom filter false positives: 146072 Bloom filter false ratio: 0.01543 Bloom filter space used: 35592 Bloom filter off heap memory used: 33504 Index summary off heap memory used: 26686 Compression metadata off heap memory used: 19590448 Compacted partition minimum bytes: 25 Compacted partition maximum bytes: 10299432635 Compacted partition mean bytes: 2334776 Average live cells per slice (last five minutes): 4.346574725773759 Maximum live cells per slice (last five minutes): 2553.0 Average tombstones per slice (last five minutes): 0.3096773382165276 Maximum tombstones per slice (last five minutes): 804.0 = On 2017-10-24 14:39, "Mohapatra, Kishore" wrote: > Hi Vedant, > I was actually referring to command line select query > with Consistency level=ALL . This will force a read repair in the background. > But as I can see, you have tried with consistency level = one and and it is > still timing out. SO what error you see in the system.log ? > Streaming error ? > > Can you also check how many sstables are there for that table . Seems like > your compaction may not be working. > Is your repair job running fine ? > > Thanks > > Kishore Mohapatra > Principal Operations DBA > Seattle, WA > Ph : 425-691-6417 (cell) > Email : kishore.mohapa...@nuance.com > > > -Original Message- > From: vbhang...@gmail.com [mailto:vbhang...@gmail.com] > Sent: Monday, October 23, 2017 6:59 PM > To: user@cassandra.apache.org > Subject: [EXTERNAL] > > It is RF=3 and 12 nodes in 3 regions and 6 in other 2, so total 48 nodes. Are > you suggesting forced read repair by reading consistency of ONE or by bumping > up read_repair_chance? > > We have tried from command line with ONE but that times out. > On 2017-10-23 10:18, "Mohapatra, Kishore" > wrote: > > What is your RF for the keyspace and how many nodes are there in each DC ? > > > > Did you force a Read Repair to see, if you are getting the data or getting > > an error ? > > > > Thanks > > > > Kishore Mohapatra > > Principal Operations DBA > > Seattle, WA > > Email : kishore.mohapa...@nuance.com > > > > > > -Original Message- > > From: vbhang...@gmail.com [mailto:vbhang...@gmail.com] > > Sent: Sunday, October 22, 2017 11:31 PM > > To: user@cassandra.apache.org > > Subject: [EXTERNAL] > > > > -- Consistency level LQ > > -- It started happening approximately couple of months back. Issue is very > > inconsistent and can't be reproduced. It used rarely happen earlier (since > > las