Cassandra Upgrade with Different Protocol Version

2018-07-05 Thread Anuj Wadehra
Hi, I woud like to know how people are doing rolling upgrade of Casandra clustes when there is a change in native protocol version say from 2.1 to 3.11. During rolling upgrade, if client application is restarted on nodes, the client driver may first contact an upgraded Cassandra node with v4 and

Re: Upgrade to v3.11.3

2019-01-16 Thread Anuj Wadehra
Hi Shalom, Just a suggestion. Before upgrading to 3.11.3 make sure you are not impacted by any open crtitical defects especially related to RT which may cause data loss e.g.14861. Please find my response below: The upgrade process that I know of is from 2.0.14 to 2.1.x (higher than 2.1.9 I thin

ApacheCon Europe 2019

2019-05-13 Thread Anuj Wadehra
Hi, Do we have any plans for dedicated Apache Cassandra track or sessions at ApacheCon Berlin in Oct 2019? CFP closes 26 May, 2019. ThanksAnuj Wadehra

Efficient Paging Option in Wide Rows

2016-04-22 Thread Anuj Wadehra
Hi, I have a wide row index table so that I can fetch all row keys corresponding to a column value.  Row of index_table will look like: ColValue1:bucket1 >> rowkey1, rowkey2.. rowkeyn..ColValue1:bucketn>> rowkey1, rowkey2.. rowkeyn We will have buckets to avoid hotspots. Row keys of main tabl

Re: Efficient Paging Option in Wide Rows

2016-04-23 Thread Anuj Wadehra
, Anuj Wadehra wrote: Hi, I have a wide row index table so that I can fetch all row keys corresponding to a column value.  Row of index_table will look like: ColValue1:bucket1 >> rowkey1, rowkey2.. rowkeyn..ColValue1:bucketn>> rowkey1, rowkey2.. rowkeyn We will have buckets to av

Re: Efficient Paging Option in Wide Rows

2016-04-23 Thread Anuj Wadehra
Hi, Can anyone take this question? ThanksAnuj Sent from Yahoo Mail on Android On Sat, 23 Apr, 2016 at 2:30 PM, Anuj Wadehra wrote: I think I complicated the question..so I am trying to put the question crisply.. We have a table defined with clustering key/column. We have  5 different

Upgrading to SSD

2016-04-23 Thread Anuj Wadehra
Hi We have a 3 node cluster of 2.0.14. We use Read/Write Qorum and RF is 3.  We want to move data and commitlog directory from a SATA HDD to SSD. We have planned to do a rolling upgrade. We plan to run repair -pr on all nodes  to sync data upfront and then execute following steps on each server

Re: StatusLogger is logging too many information

2016-04-25 Thread Anuj Wadehra
Hi, You can set the property gc_warn_threshold_in_ms in yaml.For example, if your application is ok with a 2000ms pause, you can set the value to 2000 such that only gc pauses greater than 2000ms will lead to gc and status log. Please refer https://issues.apache.org/jira/plugins/servlet/mobile#i

Re: Unable to reliably count keys on a thrift CF

2016-04-25 Thread Anuj Wadehra
Hi Carlos, Please check if the JIRA :  https://issues.apache.org/jira/browse/CASSANDRA-11467 fixes your problem. We had been facing row count issue with thrift cf / compact storage and this fixed it. Above is fixed in latest 2.1.14. Its a two line fix. So, you can also prepare a custom jar and ch

Inconsistent Reads after Restoring Snapshot

2016-04-25 Thread Anuj Wadehra
Hi, We have 2.0.14. We use RF=3 and read/write at Quorum. Moreover, we dont use incremental backups. As per the documentation at  https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html  , if i need to restore a Snapshot on SINGLE node in a cluster, I wou

Re: Upgrading to SSD

2016-04-25 Thread Anuj Wadehra
obey's tuning guide are great resources. https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html ) Clint On Apr 23, 2016 3:05 PM, "Anuj Wadehra" wrote: Hi We have a 3 node cluster of 2.0.14. We use Read/Write Qorum and RF is 3.  We want to move data and commit

Re: Inconsistent Reads after Restoring Snapshot

2016-04-26 Thread Anuj Wadehra
a.join_ring=false and then run a repair on it. Have a look at https://issues.apache.org/jira/browse/CASSANDRA-6961 Best, Romain Le Mardi 26 avril 2016 4h26, Anuj Wadehra a écrit : Hi, We have 2.0.14. We use RF=3 and read/write at Quorum. Moreover, we dont use incrementa

RE: Inconsistent Reads after Restoring Snapshot

2016-04-27 Thread Anuj Wadehra
?     Sean Durity   From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in] Sent: Monday, April 25, 2016 10:26 PM To: User Subject: Inconsistent Reads after Restoring Snapshot   Hi,   We have 2.0.14. We use RF=3 and read/write at Quorum. Moreover, we dont use incremental backups

RE: Inconsistent Reads after Restoring Snapshot

2016-04-28 Thread Anuj Wadehra
: https://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configLogArchive_t.html   Sean Durity   From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in] Sent: Wednesday, April 27, 2016 10:44 PM To: user@cassandra.apache.org Subject: RE: Inconsistent Reads after Restoring Snapshot

Production Ready/Stable DataStax Java Driver

2016-05-08 Thread Anuj Wadehra
Hi, Which DataStax Java Driver release is most stable (production ready) for Cassandra 2.1? ThanksAnuj

Re: Production Ready/Stable DataStax Java Driver

2016-05-08 Thread Anuj Wadehra
/groups.google.com/a/lists.datastax.com/d/msg/java-driver-user/tOWZm4RVbm4/5E_aDAc8IAAJ On Sun, May 8, 2016 at 7:39 AM, Anuj Wadehra wrote: Hi, Which DataStax Java Driver release is most stable (production ready) for Cassandra 2.1? ThanksAnuj -- Bests, Alex Popescu | @al3xandruSen. Produc

Evict Tombstones with STCS

2016-05-28 Thread Anuj Wadehra
Hi, We are using C* 2.0.x . What options are available if disk space is too full to do compaction on huge sstables formed by STCS (created around long ago but not getting compacted due to min_compaction_threshold being 4). We suspect that huge space will be released when 2 largest sstables get c

Re: (C)* stable version after 3.5

2016-07-13 Thread Anuj Wadehra
Hi Alain, This caught my attention: "Also I am not sure if the 2.2 major version is something you can skip while upgrading through a rolling restart. I believe you can, but it is not what is recommended." Why do you think that skipping 2.2 is not recommended when NEWS.txt suggests otherwise? Ca

Public Interface Failure in Multiple DC setup

2016-08-11 Thread Anuj Wadehra
Hi, Setup: Cassandra 2.0.14 with PropertyFileSnitch. 2 Data Centers. Every node has broadcast address= Public IP (bond0) & listen address=Private IP (bond1). As per DataStax docs, (https://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configMultiNetworks.html), "For intra-network

Re: Public Interface Failure in Multiple DC setup

2016-08-11 Thread Anuj Wadehra
Hi  Can someone take these questions? ThanksAnuj On Thu, 11 Aug, 2016 at 8:30 PM, Anuj Wadehra wrote: Hi, Setup: Cassandra 2.0.14 with PropertyFileSnitch. 2 Data Centers. Every node has broadcast address= Public IP (bond0) & listen address=Private IP (bond1). As per DataStax

Preferred IP is NULL

2016-08-20 Thread Anuj Wadehra
Hi, We use multiple interfaces in multi DC setup.Broadcast address is public IP while listen address is private IP. I dont understand why prefeerred IP in peers table is null for all rows. There is very little documentation on the role of preferred IP and when it is set. As per code TCP connectio

Re: Preferred IP is NULL

2016-08-21 Thread Anuj Wadehra
preferred IPs. ThanksAnuj On Sun, 21 Aug, 2016 at 7:10 PM, Paulo Motta wrote: See CASSANDRA-9748, I think it might be related. 2016-08-20 15:20 GMT-03:00 Anuj Wadehra : Hi, We use multiple interfaces in multi DC setup.Broadcast address is public IP while listen address is private IP. I

Re: Preferred IP is NULL

2016-08-22 Thread Anuj Wadehra
We are using PropertyFileSnitch. ThanksAnuj Sent from Yahoo Mail on Android On Mon, 22 Aug, 2016 at 7:09 PM, Paulo Motta wrote: What snitch are you using? If GPFS you need to enable the prefer_local=true flag (this is automatic on EC2MRS). 2016-08-21 22:24 GMT-03:00 Anuj Wadehra : Hi

Re: Preferred IP is NULL

2016-08-24 Thread Anuj Wadehra
interface to private interface. NAT rule is needed due to CASSANDRA-9748 (No process listens on broadcast address). ThanksAnuj Sent from Yahoo Mail on Android On Mon, 22 Aug, 2016 at 11:55 PM, Anuj Wadehra wrote: We are using PropertyFileSnitch. ThanksAnuj Sent from Yahoo Mail on

Open File Handles for Deleted sstables

2016-09-28 Thread Anuj Wadehra
Hi, We are facing an issue where Cassandra has open file handles for deleted sstable files. These open file handles keep on increasing with time and eventually lead to disk crisis. This is visible via lsof command.  There are no Exceptions in logs.We are suspecting a race condition where compact

Re: Open File Handles for Deleted sstables

2016-09-28 Thread Anuj Wadehra
of those files in our situation. thanksSai On Wed, Sep 28, 2016 at 3:15 PM, Anuj Wadehra wrote: Hi, We are facing an issue where Cassandra has open file handles for deleted sstable files. These open file handles keep on increasing with time and eventually lead to disk crisis. This is visible via

Re: Multiple Network Interfaces in non-EC2

2016-10-12 Thread Anuj Wadehra
Hi Amir, I would like to understand your requirement first. Why do you need multiface interface configuration mentioned at http://docs.datastax.com/en/cassandra/3.x/cassandra/configuration/configMultiNetworks.html with single DC setup? As per my understanding, you could simply set listen addre

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Anuj Wadehra
Hi Leena, First thing you should be concerned about is : Why the repair -pr operation doesnt complete ? Second comes the question : Which repair option is best? One probable cause of stuck repairs is : if the firewall between DCs is closing TCP connections and Cassandra is trying to use such c

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-15 Thread Anuj Wadehra
Hi Leena, Do you have a firewall between the two DCs? If yes, "connection reset" can be caused by Cassandra trying to use a TCP connection which is already closed by the firewall. Please make sure that you set high connection timeout at firewall. Also, make sure your servers are not overloaded.

Re: Cassandra installation best practices

2016-10-17 Thread Anuj Wadehra
Hi Mehdi, You can refer  https://docs.datastax.com/en/landing_page/doc/landing_page/recommendedSettings.html  . ThanksAnuj On Mon, 17 Oct, 2016 at 10:20 PM, Mehdi Bada wrote: Hi all, It is exist some best practices when installing Cassandra on production environment? Some standard to follow?

Handle Leap Seconds with Cassandra

2016-10-20 Thread Anuj Wadehra
Hi, I would like to know how you guys handle leap seconds with Cassandra.  I am not bothered about the livelock issue as we are using appropriate versions of Linux and Java. I am more interested in finding an optimum answer for the following question: How do you handle wrong ordering of multiple

Re: Handle Leap Seconds with Cassandra

2016-10-27 Thread Anuj Wadehra
seconds (clock drift, NTP inaccuracy etc).   On Thu, 20 Oct 2016 at 10:30 Anuj Wadehra wrote: Hi, I would like to know how you guys handle leap seconds with Cassandra.  I am not bothered about the livelock issue as we are using appropriate versions of Linux and Java. I am more interested in finding

Re: Handle Leap Seconds with Cassandra

2016-11-02 Thread Anuj Wadehra
-cassandra-synchronization/. Ben   On Thu, 27 Oct 2016 at 10:18 Anuj Wadehra wrote: Hi Ben, Thanks for your reply. We dont use timestamps in primary key. We rely on server side timestamps generated by coordinator. So, no functions at client side would help.  Yes, drifts can create problems too. B

Re: Some questions to updating and tombstone

2016-11-14 Thread Anuj Wadehra
Hi Boying, I agree with Vladimir.If compaction is not compacting the two sstables with updates soon, disk space issues will be wasted. For example, if the updates are not closer in time, first update might be in a big table by the time second update is being written in a new small table. STCS wo

Configure NTP for Cassandra

2016-11-26 Thread Anuj Wadehra
Hi, One popular NTP setup recommended for Cassandra users is described at  Thankshttps://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/ . Summary of article is:Setup recommends a dedicated pool of internal NTP servers which are associated as peers to provi

Re: Single cluster node restore

2016-12-02 Thread Anuj Wadehra
Hi Petr, If data corruption means accidental data deletions via Cassandra commands, you have to restore entire cluster with latest snapshots. This may lead to data loss as there may be valid updates after the snapshot was taken but before the data deletion. Restoring single node with snapshot wo

Join_ring=false Use Cases

2016-12-13 Thread Anuj Wadehra
Hi, I need to understand the use case of join_ring=false in case of node outages. As per https://issues.apache.org/jira/browse/CASSANDRA-6961, you would want join_ring=false when you have to repair a node before bringing a node back after some considerable outage. The problem I see with join_ri

Re: Configure NTP for Cassandra

2016-12-13 Thread Anuj Wadehra
Any NTP experts willing to take up these questions? Thanks Anuj On Sun, 27 Nov, 2016 at 12:52 AM, Anuj Wadehra wrote: Hi, One popular NTP setup recommended for Cassandra users is described at  Thankshttps://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2

Re: Configure NTP for Cassandra

2016-12-13 Thread Anuj Wadehra
the NTP questions mailing list: http://lists.ntp.org/listinfo/questions On Tue, Dec 13, 2016 at 1:25 PM, Anuj Wadehra wrote: > Any NTP experts willing to take up these questions? > > Thanks > Anuj > > On Sun, 27 Nov, 2016 at 12:52 AM, Anuj Wadehra > wrote: > Hi, &

Re: Configure NTP for Cassandra

2016-12-14 Thread Anuj Wadehra
synchronization using reliable external servers. There is no madate to setup your own pool of internal NTP servers for BETTER time synchronization. Thanks for your inputs.Anuj On Wed, 14 Dec, 2016 at 3:22 AM, Martin Schröder wrote: 2016-11-26 20:20 GMT+01:00 Anuj Wadehra : > 1. If my

Re: Join_ring=false Use Cases

2016-12-14 Thread Anuj Wadehra
Can anyone help me with join_ring and address my concerns? Thanks Anuj On Tue, 13 Dec, 2016 at 11:31 PM, Anuj Wadehra wrote: Hi, I need to understand the use case of join_ring=false in case of node outages. As per https://issues.apache.org/jira/browse/CASSANDRA-6961, you would want

Re: Join_ring=false Use Cases

2016-12-20 Thread Anuj Wadehra
No responses yet :) Any C* expert who could help on join_ring use case and the concern raised? Thanks Anuj On Tue, 13 Dec, 2016 at 11:31 PM, Anuj Wadehra wrote: Hi, I need to understand the use case of join_ring=false in case of node outages. As per https://issues.apache.org/jira/browse

Re: Join_ring=false Use Cases

2016-12-21 Thread Anuj Wadehra
to make it actually useful for repairs/outages we'd need to have another option to turn on writes so that it behaved similarly to write survey mode (but on already bootstrapped nodes). Is there a reason we don't have this already? Or does it exist somewhere I'm not aware of? On 2

Re: Disc size for cluster

2017-01-26 Thread Anuj Wadehra
Adding to what Benjamin said.. It is hard to estimate disk space if you are using STCS for a table where rows are updated frequently leading to lot of fragmentation. STCS may also lead to scenarios where tombstones are not evicted for long times. You may go live and everything goes well for mont

Re: Cluster scaling

2017-02-08 Thread Anuj Wadehra
Hi Branislav, I quickly went through the code and noticed that you are updating RF from code and expecting that Cassandra would automatically distribute replicas as per the new RF. I think this is not how it works. After updating the RF, you need to run repair on all the nodes to make sure that

Re: Read after Write inconsistent at times

2017-02-25 Thread Anuj Wadehra
Hi Charulata, Please share details on how data is being inserted and read.  Is the client which is reading the data same as the one which inserted it? Is the read happening only when insertion is successful? Are you using client timestamps? How did you verify that NTP is working properly? How NTP

Re: Is it possible to recover a deleted-in-future record?

2017-03-08 Thread Anuj Wadehra
DISCLAIMER: This is only my personal opinion. Evaluate the situation carefully and if you find below suggestions useful, follow them at your own risk. If I have understood the problem correctly, malicious deletes would actually lead to deletion of data.  I am not sure how everything is normal aft

Incremental Repair

2017-03-12 Thread Anuj Wadehra
Hi, Our setup is as follows: 2 DCS with N nodes, RF=DC1:3,DC2:3, Hinted Handoff=3 hours, Incremental Repair scheduled once on every node (ALL DCs) within the gc grace period. I have following queries regarding incremental repairs: 1. When a node is down for X hours (where x > hinted handoff hou

Read Consistency

2015-06-23 Thread Anuj Wadehra
Hi, Need to validate my understanding.. RF=3 , Read CL = Quorum What would be returned to the client in following scenarios: Scenario 1: Read query is fired for a key, data is found on one node and not found on other two nodes who are responsible for the token corresponding to key. Optio

Re: Read Consistency

2015-06-23 Thread Anuj Wadehra
data with different timestamps. Read query will return the data with most recent timestamp and trigger a read repair in the backend . On Tue, Jun 23, 2015 at 10:57 AM, Anuj Wadehra wrote: Hi, Need to validate my understanding.. RF=3 , Read CL = Quorum What would be returned to the

Re: Read Consistency

2015-06-23 Thread Anuj Wadehra
p Thompson wrote: Anuj, In the first scenario, the data from the single node holding data is returned. The query will not fail if the consistency level is met, even if the read was inconsistent. On Tue, Jun 23, 2015 at 2:16 PM, Anuj Wadehra wrote: Why would it fail and with what Thrift erro

Re: Read Consistency

2015-06-23 Thread Anuj Wadehra
Thanks..So all of us agree that in scenario 1, data would be returned and that was my initial understanding.. Anuj Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Wed, 24 Jun, 2015 at 12:15 am Subject:Re: Read Consistency M more confused...Different responses. .Anyo

Adding Nodes With Inconsistent Data

2015-06-24 Thread Anuj Wadehra
Hi, We faced a scenario where we lost little data after adding 2 nodes in the cluster. There were intermittent dropped mutations in the cluster. Need to verify my understanding how this may have happened to do Root Cause Analysis: Scenario: 3 nodes, RF=3, Read / Write CL= Quorum 1. Due to o

Re: Read Consistency

2015-06-28 Thread Anuj Wadehra
that data may not be queried at all (the other two may). Keep in mind, these scenarios seem to generally assume you are not writing data at consistently at QUORUM CL so therefore your reads may be inconsistent. On Tuesday, June 23, 2015, Anuj Wadehra wrote: | Thanks..So all of us agree th

Re: Adding Nodes With Inconsistent Data

2015-06-28 Thread Anuj Wadehra
following best practices or understanding the internals is the key imho. I would say it is a good question though. Alain. 2015-06-24 19:43 GMT+02:00 Anuj Wadehra : | Hi, We faced a scenario where we lost little data after adding 2 nodes in the cluster. There were intermittent dropped mutati

Re: Read Consistency

2015-06-28 Thread Anuj Wadehra
Sorry for typo in your name Owen !! Anuj Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Sun, 28 Jun, 2015 at 11:11 pm Subject:Re: Read Consistency Agree Owem !! Response in both scenarios would depend on the 2 replicas chosen for meeting QUORUM. But, the intent is

Re: Read Consistency

2015-06-30 Thread Anuj Wadehra
do cleanup or rollback on one node so you need to do it yourself to make sure that integrity of data is maintained in case strong consistency is a requirement. Right? We use Hector by the way and plannning to switch to CQL driver.. Thanks Anuj Wadehra Sent from Yahoo Mail on Android From

Re: Cassandra OOM on joining existing ring

2015-07-13 Thread Anuj Wadehra
We faced similar issue where we had 60k sstables due to coldness bug in 2.0.3. We solved it by following Datastax recommendation for Production at http://docs.datastax.com/en/cassandra/1.2/cassandra/install/installRecommendSettings.html : Step 1 : Add the following line to /etc/sysctl.conf :

Re: Unbalanced disk load

2015-07-19 Thread Anuj Wadehra
Moreover, if you are using SSDs keeping data directories and commitlog on separate disks wont provide much benefit. As Nate said, relying on RAID with RF=1 is not good design. Cassandra replicas provide greater fault tolerance and HA as they are on different nodes.  Thanks Anuj Sent from

Manual Indexing With Buckets

2015-07-23 Thread Anuj Wadehra
We have a primary table and we need search capability by batchid column. So we are creating a manual index for search by batch id. We are using buckets to restrict a row size in batch id index table to 50mb. As batch size may vary drastically ( ie one batch id may be associated to 100k row keys

Best Practise for Updating Index and Reporting Tables

2015-07-23 Thread Anuj Wadehra
We have a transaction table,3 manually created index tables and few tables for reporting.  One option is to go for atomic batch mutations so that for each transaction every index table and other reporting tables are updated synchronously.  Other option is to update other tables async, there m

Re: Manual Indexing With Buckets

2015-07-24 Thread Anuj Wadehra
Can anyone take this one? Thanks Anuj Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Thu, 23 Jul, 2015 at 10:57 pm Subject:Manual Indexing With Buckets We have a primary table and we need search capability by batchid column. So we are creating a manual index for searc

Re: Best Practise for Updating Index and Reporting Tables

2015-07-25 Thread Anuj Wadehra
ate consistency. The only thing an atomic batch guarantees is that all of the statements in the batch will eventually be executed. Both approaches are eventually consistent, so you have to deal with inconsistency either way. On Jul 23, 2015, at 11:46 AM, Anuj Wadehra wrote: We have a

Re: RE: Manual Indexing With Buckets

2015-07-25 Thread Anuj Wadehra
he primary only associated with one batch?     Sean Durity – Cassandra Admin, Big Data Team To engage the team, create a request   From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in] Sent: Friday, July 24, 2015 3:57 AM To: user@cassandra.apache.org Subject: Re: Manual Indexing With Buckets  

Re: if seed is diff on diff nodes, any problem ?

2015-07-26 Thread Anuj Wadehra
As per my understanding, 2 same seed nodes per dc is the way to go.. If u r not creating two isolated set of nodes in ur cluster, there may be nodes referring each other in a way that everyone is able to know everyone else.. Anuj Sent from Yahoo Mail on Android From:"Chris Mawata" Date:S

Re: RE: Manual Indexing With Buckets

2015-07-28 Thread Anuj Wadehra
Any more thoughts ? Anyone? Thanks Anuj Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Sat, 25 Jul, 2015 at 5:14 pm Subject:Re: RE: Manual Indexing With Buckets We are in product development and batch size depends on the customer base of customer buying our pro

Re: memtable and sstables

2015-09-05 Thread Anuj Wadehra
Memtables are for storing writes in memory till they are flushed to disk as sstables and once flushed, space gets released from commit logs too.. If your are updating some columns only that data would be there in memtables not entire row..Dont think of memtables as row cache.. This is my under

Re: How to prevent queries being routed to new DC?

2015-09-07 Thread Anuj Wadehra
Hi Tom, While reading data ( even at CL LOCAL_QUORUM), if data in different nodes required to meet CL in your local cluster doesnt match, data will be read from remote dc for read repair if read_repair_chance is not 0. Imp points: 1.If you are reading and writing at local_quorum you can set r

Throttling Cassandra Load

2015-09-22 Thread Anuj Wadehra
Hi, We are using Cassandra 2.0.14 with Hector 1.1.4. Each node in cluster has an application using Hector and a Cassandra instance. I want suggestions on the approach we are taking for throttling Cassandra load. Problem Statement: Misbehaved clients can bring down Cassandra clusters by puttin

Re: Throttling Cassandra Load

2015-09-22 Thread Anuj Wadehra
:"Robert Coli" Date:Wed, 23 Sep, 2015 at 2:43 am Subject:Re: Throttling Cassandra Load On Tue, Sep 22, 2015 at 1:06 PM, Anuj Wadehra wrote: We are using Cassandra 2.0.14 with Hector 1.1.4. Each node in cluster has an application using Hector and a Cassandra instance. Why are you us

Re: Throttling Cassandra Load

2015-09-24 Thread Anuj Wadehra
we can think of applying similar throttling with native protocol. Yes CQL driver may provide us some advanced properties for tuning connection pooling and timeout idle connections. Thanks Anuj Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Wed, 23 Sep, 2015 at 9:03 am

Re: Throttling Cassandra Load

2015-09-27 Thread Anuj Wadehra
Hi, Any suggestions/comments on approach ? What you guys are doing to keep check on misbehaved clients and restrict Cassandra load. Note: We will be moving to CQL driver but that will take months.  Anuj Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Wed, 23 Sep, 2015

Repair Hangs while requesting Merkle Trees

2015-11-11 Thread Anuj Wadehra
method always returns false for non-droppable verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why increasing request timeout solved problem on one occasion ? Thanks Anuj Wadehra

Re: Repair Hangs while requesting Merkle Trees

2015-11-11 Thread Anuj Wadehra
OutboundTcpConnection.java,  when isTimeOut method always returns false for non-droppable verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why increasing request timeout solved problem on one occasion ? Thanks Anuj Wadehra On Thursday, 12 November 2015 2:35 AM, Anuj Wadehra wrote: Hi, We have 2 DCs

Re: Repair Hangs while requesting Merkle Trees

2015-11-14 Thread Anuj Wadehra
n Wed, Nov 11, 2015 at 1:06 PM, Anuj Wadehra wrote: Hi, we are using 2.0.14. We have 2 DCs at remote locations with 10GBps connectivity.We are able to complete repair (-par -pr) on 5 nodes. On only one node in DC2, we are unable to complete repair as it always hangs. Node sends Merkle Tree req

Re: Repair Hangs while requesting Merkle Trees

2015-11-14 Thread Anuj Wadehra
roid From:"Anuj Wadehra" Date:Sat, 14 Nov, 2015 at 11:59 pm Subject:Re: Repair Hangs while requesting Merkle Trees Thanks Daemeon !! I wil capture the output of netstats and share in next few days. We were thinking of taking tcp dumps also. If its a network issue and increasing request

Re: Repair time comparison for Cassandra 2.1.11

2015-11-15 Thread Anuj Wadehra
Repair can take long time if you have lota of inconaistent data. If you havent restarted nodes yet, you can  run nodetool tpstats command on all nodes to make sure that there no mutation drops. Thanks Anuj Sent from Yahoo Mail on Android From:"badr...@tuta.io" Date:Sun, 15 Nov, 2015 at 4:20

Re: Repair time comparison for Cassandra 2.1.11

2015-11-15 Thread Anuj Wadehra
Ok. I dont have much experience with 2.1 as we are on 2.0.x. Are you using sequential repair? If yes, parallel repair can be faster but you need to make sure that your application has sufficient room to run when cluster is running repair. Are you observing any WARN or ERROR messages in logs wh

Re: Repair time comparison for Cassandra 2.1.11

2015-11-15 Thread Anuj Wadehra
For the error, you can see  http://www.scriptscoop.net/t/3bac9a3307ac/cassandra-lost-notification-from-nodetool-repair.html Lost notification should not be a problem.please see  https://issues.apache.org/jira/browse/CASSANDRA-7909 Infact, we are also currently facing an issue where merkle tr

Re: handling down node cassandra 2.0.15

2015-11-16 Thread Anuj Wadehra
Hi Abhishek, In my opinion, you already have data and bootstrapping is not needed here. You can set auto_bootstrap to false in Cassandra.yaml and once the cassandra is rebooted, you should run repair to fix the inconsistent data. ThanksAnuj On Monday, 16 November 2015 10:34 PM, Josh Smi

Re: handling down node cassandra 2.0.15

2015-11-16 Thread Anuj Wadehra
Sis you set the JVM_OPTS to replace address? That is usually the error I get when I forget to set the replace_address on Cassandra-env.   JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node     From: Anishek Agarwal [mailto:anis...@gmail.com] Sent: Monday, November 16, 2015

Re: Repair Hangs while requesting Merkle Trees

2015-11-16 Thread Anuj Wadehra
ack tuning for cross-DC, notably your buffer sizes, window params, cassandra-specific stuff like otc_coalescing_strategy, inter_dc_tcp_nodelay, etc. On Sat, Nov 14, 2015 at 10:35 AM, Anuj Wadehra wrote: One more observation.We observed that there are few TCP connections which node shows as Establ

Re: handling down node cassandra 2.0.15

2015-11-17 Thread Anuj Wadehra
_bootstrap = false and then restart the node, followed by repair on this machine right ? thanks anishek On Mon, Nov 16, 2015 at 11:40 PM, Anuj Wadehra wrote: Hi Abhishek, In my opinion, you already have data and bootstrapping is not needed here. You can set auto_bootstrap to false in Cass

Re: Repair Hangs while requesting Merkle Trees

2015-11-17 Thread Anuj Wadehra
how much data can be in-flight between acknowledgements, and the default size is pitiful for any decent   network size. Google around for TCP tuning/buffer tuning and you should find some good resources. On Mon, Nov 16, 2015 at 5:23 PM, Anuj Wadehra wrote: Hi Bryan, Thanks for the reply !! I didn

Re: handling down node cassandra 2.0.15

2015-11-18 Thread Anuj Wadehra
wrote: On Tue, Nov 17, 2015 at 4:33 AM, Anuj Wadehra wrote: Only if gc_grace_seconds havent passed since the failure. If your machine is down for more than gc_grace_seconds you need to delete the data directory and go with auto bootstrap = true . Since CASSANDRA-6961 you can : 1) bring up

Re: Replication of data over 2 Datacentre's, when one node fails we get replica issues

2015-11-18 Thread Anuj Wadehra
Hi Walsh, My comments: 1. Keeping RF at 2 and CL at LOCAL_QUORUM would not give you any additional fault tolerance. You wont be able to afford a single node failure with RF=2. I would suggest keeping it at 3 so that you can tolerate a single node failure. Your query failed because RF=2 and

Re: Range scans

2015-11-19 Thread Anuj Wadehra
Hi Chandra, I will comment on some points. Someone else can take remaining ones: 1. Secondary Index are only useful when data returned by the index query is in hundreds. Fetching large data using secondary index would be very slow. Secondary indexes dont scale well. 2.token query should be

Re: Range scans

2015-11-19 Thread Anuj Wadehra
http://intellidzine.blogspot.in/2014/01/cassandra-data-modelling-primary-keys.html?m=1 Thanks Anuj Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Thu, 19 Nov, 2015 at 5:31 pm Subject:Re: Range scans Hi Chandra, I will comment on some points. Someone else can take rema

Re: Repair Hangs while requesting Merkle Trees

2015-11-23 Thread Anuj Wadehra
triggered for cross DC nodes..and hints replay being timed-out..Is that an indication of a network issue? I am getting in tough with network team to capture netstats and tcpdump too.. Thanks Anuj On Wed, 18/11/15, Anuj Wadehra wrote: Subject: Re

Re: Repair Hangs while requesting Merkle Trees

2015-11-29 Thread Anuj Wadehra
via its public IP. Thanks Anuj On Tue, 24/11/15, Paulo Motta wrote: Subject: Re: Repair Hangs while requesting Merkle Trees To: "user@cassandra.apache.org" , "Anuj Wadehra" Date: Tuesday, 24 November, 2015, 12:38 AM The is

Re: Repair Hangs while requesting Merkle Trees

2015-11-29 Thread Anuj Wadehra
rred) 3. Hinted handoff started for 3rd node (10.X.14.115 ) but hint replay timed-out. If it's a network issue then why the issue is only in DC2 and mostly observed on one node. ThanksAnuj On Sunday, 29 November 2015 10:44 PM, Anuj Wadehra wrote: Yes. I think you are correct, pr

Re: Repair Hangs while requesting Merkle Trees

2015-11-29 Thread Anuj Wadehra
Please find attached netstat -t -as output for the node on which repair hung and the node which never got Merkle Tree Request. ThanksAnuj On Sunday, 29 November 2015 11:13 PM, Anuj Wadehra wrote: Hi All, I am summarizing the setup, problem & key observations till now: S

Handle Node Failure with Repair -pr

2015-12-04 Thread Anuj Wadehra
Hi Guys !! I need comments on my understanding of repair -pr ..If you are using repair -pr in your cluster then following statements hold true: 1. If a node goes down for long time and your not sure when will it return, you must ensure that subrange repair for the defected node range is done

Re: Want to run repair on a node without it taking traffic

2015-12-04 Thread Anuj Wadehra
Hi Robert, Did u say "longer than gc_grace_seconds" ? Wont deletes pop back during repair? Thanks Anuj Sent from Yahoo Mail on Android From:"Robert Coli" Date:Thu, 3 Dec, 2015 at 12:21 am Subject:Re: Want to run repair on a node without it taking traffic On Wed, Dec 2, 2015 at 8:54 AM, K

Re: Handle Node Failure with Repair -pr

2015-12-07 Thread Anuj Wadehra
Hi All !!! Any comments on the repair -pr scenarios..please share how you deal with such scenarios.. Thanks Anuj Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Sat, 5 Dec, 2015 at 12:57 am Subject:Handle Node Failure with Repair -pr Hi Guys !! I need comm

Re: Re: Re: Cassandra Tuning Issue

2015-12-08 Thread Anuj Wadehra
Hi Jerry, Its great that you got performance improvement. Moreover, I agree with what Graham said. I think that you are using extremely large Heaps with CMS and that too in very odd ratio..Having 40G for new gen and leaving only 20G old gen seems unreasonable..Its hard to believe that you are

Re: Re:Re: Re: Re: Cassandra Tuning Issue

2015-12-08 Thread Anuj Wadehra
By the way, how to " inform the C* email list as well so that others know" as Jack said? I am sorry I have not do that yet. Thanks jerry At 2015-12-09 01:09:07, "Anuj Wadehra" wrote: Hi Jerry, Its great that you got performance improvement. Moreover, I agree with what

Re: Thousands of pending compactions using STCS

2015-12-11 Thread Anuj Wadehra
There was a JIRA that cold sstables are not compacted leading to thousands of sstables. Issue got fixed in 2.0.4. Which version of Cassandra are you using? Anuj Sent from Yahoo Mail on Android From:"Jeff Jirsa" Date:Fri, 11 Dec, 2015 at 10:42 pm Subject:Re: Thousands of pending compactions us

Re: Thousands of pending compactions using STCS

2015-12-11 Thread Anuj Wadehra
Sorry I missed the version in your mail..you are on 2.0.16..so it cant be coldness issue.. Anuj  Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Fri, 11 Dec, 2015 at 10:48 pm Subject:Re: Thousands of pending compactions using STCS There was a JIRA that cold sstabl

Revisit Cassandra EOL Policy

2016-01-05 Thread Anuj Wadehra
Hi, As per my understanding, a Cassandra version n is implicitly declared EOL when two major versions are released after the version n i.e. when version n + 2 is released. I think the EOL policy must be revisted in interest of the expanding Cassandra user base.  Concerns with current EOL Policy:

  1   2   3   >