Re: very slow repair

2019-06-12 Thread Laxmikant Upadhyay
Few queries: 1. What is the cassandra version ? 2. is the size of table 4TB per node ? 3. What is the value of compaction_throughput_mb_per_sec and stream_throughput_outbound_megabits_per_sec ? On Thu, Jun 13, 2019 at 5:06 AM R. T. wrote: > Hi, > > I am trying to run a repair for first time a sp

very slow repair

2019-06-12 Thread R. T.
Hi, I am trying to run a repair for first time a specific column family in specific keyspace and it seems that is going super slow. I have 6 nodes cluster with 2 Datacenters (RF 2) and the repair is a non incremental, DC parallel one. This column family is around 4 TB and it is written heavily

Decommissioned nodes are in UNREACHABLE state

2019-06-12 Thread Jai Bheemsen Rao Dhanwada
Hello, I have a Cassandra cluster running with 2.1.16 version of Cassandra, where I have decommissioned few nodes from the cluster using "nodetool decommission", but I see the node IPs in UNREACHABLE state in "nodetool describecluster" output. I believe they appear only for 72 hours, but in my ca

Re: postmortem on 2.2.13 scale out difficulties

2019-06-12 Thread Carl Mueller
I posted a bug, cassandra-15155 : https://issues.apache.org/jira/browse/CASSANDRA-15155?jql=project%20%3D%20CASSANDRA It seems VERY similar to https://issues.apache.org/jira/browse/CASSANDRA-6648 On Wed, Jun 12, 2019 at 12:14 PM Carl Mueller wrote: > And once the cluster token map formation is

Re: Recover lost node from backup or evict/re-add?

2019-06-12 Thread Jon Haddad
100% agree with Sean. I would only use Cassandra backups in a case where you need to restore from full cluster loss. Example: An entire DC burns down, tornado, flooding. Your routine node replacement after a failure should be replace_address_first_boot. To ensure this goes smoothly, run regular

RE: Recover lost node from backup or evict/re-add?

2019-06-12 Thread Durity, Sean R
I’m not sure it is correct to say, “you cannot.” However, that is a more complicated restore and more likely to lead to inconsistent data and take longer to do. You are basically trying to start from a backup point and roll everything forward and catch up to current. Replacing/re-streaming is t

RE: Recover lost node from backup or evict/re-add?

2019-06-12 Thread Alan Gano
Is it correct to say that a lost node cannot be restored from backup? You must either replace the node or evict/re-add (i.e., rebuild from other nodes). Also, that snapshot, incremental, commitlog backups are relegated to application keyspace recovery only? How about recovery of the entire c

Re: postmortem on 2.2.13 scale out difficulties

2019-06-12 Thread Carl Mueller
And once the cluster token map formation is done, it starts bootstrap and we get a ton of these: WARN [MessagingService-Incoming-/2406:da14:95b:4503:910e:23fd:dafa:9983] 2019-06-12 15:22:04,760 IncomingTcpConnection.java:100 - UnknownColumnFamilyException reading from socket; closing org.apache.c

Re: postmortem on 2.2.13 scale out difficulties

2019-06-12 Thread Carl Mueller
One node at a time: yes that is what we are doing We have not tried the streaming_socket_timeout_in_ms. It is currently 24 hours. (```streaming_socket_timeout_in_ms=8640```) which would cover the bootstrap timeframe we have seen before (1-2 hours per node) Since it joins with no data, it is s

RE: postmortem on 2.2.13 scale out difficulties

2019-06-12 Thread ZAIDI, ASAD A
Adding one node at a time – is that successful? Check value of streaming_socket_timeout_in_ms parameter in cassandra.yaml and increase if needed. Have you tried Nodetool bootstrap resume & jvm option i.e. JVM_OPTS="$JVM_OPTS -Dcassandra.consistent.rangemovement=false" ? From: Carl Mueller [m

Re: postmortem on 2.2.13 scale out difficulties

2019-06-12 Thread Carl Mueller
We're getting DEBUG [GossipStage:1] 2019-06-12 15:20:07,797 MigrationManager.java:96 - Not pulling schema because versions match or shouldPullSchemaFrom returned false multiple times, as it contacts the nodes. On Wed, Jun 12, 2019 at 11:35 AM Carl Mueller wrote: > We only were able to scale ou

Re: postmortem on 2.2.13 scale out difficulties

2019-06-12 Thread Carl Mueller
We only were able to scale out four nodes and then failures started occurring, including multiple instances of nodes joining a cluster without streaming. Sigh. On Tue, Jun 11, 2019 at 3:11 PM Carl Mueller wrote: > We had a three-DC (asia-tokyo/europe/us) cassandra 2.2.13 cluster, AWS, > IPV6 >

ApacheCon North America 2019 Schedule Now Live!

2019-06-12 Thread Rich Bowen
Dear Apache Enthusiast, (You’re receiving this message because you’re subscribed to one or more Apache Software Foundation project user mailing lists.) We’re thrilled to announce the schedule for our upcoming conference, ApacheCon North America 2019, in Las Vegas, Nevada. See it now at https

Re: Recover lost node from backup or evict/re-add?

2019-06-12 Thread Jeff Jirsa
A host can replace itself using the method I described > On Jun 12, 2019, at 7:10 AM, Alan Gano wrote: > > I guess I’m considering this scenario: > · host and configuration have survived > · /data is gone > · /backups have survived > > I have tested recovering from thi

RE: Recover lost node from backup or evict/re-add?

2019-06-12 Thread Alan Gano
I guess I’m considering this scenario: · host and configuration have survived · /data is gone · /backups have survived I have tested recovering from this scenario with an evict/re-add, which worked fine. If I restore from backup, the node will be behind the cluster – e

Re: Recover lost node from backup or evict/re-add?

2019-06-12 Thread Jeff Jirsa
To avoid violating consistency guarantees, you have to repair the replicas while the lost node is down Once you do that it’s typically easiest to bootstrap a replacement (there’s a property named “replace address first boot” you can google or someone can link) that tells a new joining host to t

Recover lost node from backup or evict/re-add?

2019-06-12 Thread Alan Gano
If I lose a node, does it make sense to even restore from snapshot/incrementals/commitlogs? Or is the best way to do an evict/re-add? Thanks, Alan. NOTICE: This communication is intended only for the person or entity to whom it is addressed and may contain confidential, proprietary, and/or