Re: Repair errors

2023-08-11 Thread Surbhi Gupta
Try sstablescrub on the node where it is showing corrupted data. On Fri, Aug 11, 2023 at 8:38 AM Joe Obernberger < joseph.obernber...@gmail.com> wrote: > Finally found a message on another node that seem relevant: > > INFO [CompactionExecutor:7413] 2023-08-11 11:36:22,397 > CompactionTask.java:1

Re: Repair errors

2023-08-11 Thread Joe Obernberger
Finally found a message on another node that seem relevant: INFO  [CompactionExecutor:7413] 2023-08-11 11:36:22,397 CompactionTask.java:164 - Compacting (d30b64ba-385c-11ee-8e74-edf5512ad115) [/data/3/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-97958-big-Data.db:le

Re: Repair errors

2023-08-07 Thread manish khandelwal
What logs of /172.16.20.16:7000 say when repair failed. It indicates "validation failed". Can you check system.log for /172.16.20.16:7000 and see what they say. Looks like you have some issue with *doc/origdoc, probably some corrupt sstable. *Try to run repair for individual table and see for wh

Re: Repair errors

2023-08-07 Thread Joe Obernberger
Thank you.  I've tried: nodetool repair --full nodetool repair -pr They all get to 57% on any of the nodes, and then fail. Interestingly the debug log only has INFO - there are no errors. [2023-08-07 14:02:09,828] Repair command #6 failed with error Incremental repair session 83dc17d0-354c-11e

Re: Repair errors

2023-08-06 Thread Josh McKenzie
Quick drive-by observation: > Did not get replies from all endpoints.. Check the > logs on the repair participants for further details > dropping message of type HINT_REQ due to error > org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The > channel this output stream was writing to

Re: Repair errors

2023-08-04 Thread Surbhi Gupta
Can you please try to do nodetool describecluster from every node of the cluster? One time I noticed issue when nodetool status shows all nodes UN but describecluster was not. Thanks Surbhi On Fri, Aug 4, 2023 at 8:59 AM Joe Obernberger wrote: > Hi All - been using reaper to do repairs, but it

Re: Repair on a slow node (or is it?)

2021-03-31 Thread Lapo Luchini
Thanks for all your suggestions! I'm looking into it and so far it seems to be mainly a problem of disk I/O, as the host is running on spindle disks and being a DR of an entire cluster gives it many changes to follow. First (easy) try will be to add an SSD as ZFS cache (ZIL + L2ARC). Should m

Re: Repair on a slow node (or is it?)

2021-03-29 Thread Kane Wilson
Check what your compactionthroughput is set to, as it will impact the validation compactions. also what kind of disks does the DR node have? The validation compaction sizes are likely fine, I'm not sure of the exact details but it's normal to expect very large validations. Rebuilding would not be

Re: Repair and NodeSync

2020-04-02 Thread J.B. Langston
NodeSync can place higher load on your cluster than traditional repair. I have seen some clusters that were OK with OpsCenter repair service or manual repairs get overloaded with NodeSync. So I would recommend testing NodeSync out with a realistic workload to make sure your cluster can handle it,

Re: repair failed

2020-01-02 Thread Ben Mills
Hi Oliver, I don't have a quick answer (or any answer yet), though we ran into a similar issue and I'm wondering about your environment and some configs. - Operating system? - Cloud or on-premise? - Version of Cassandra? - Version of Java? - Compaction strategy? - Primarily read or primarily writ

Re: Repair Strategies

2019-12-11 Thread Adarsh Kumar
Just a reminder, kindly provide your comments and suggestions. On Thu, Dec 5, 2019 at 3:46 PM Adarsh Kumar wrote: > Hi, > > We are in the process of designing a new solution around cassandra. As > repairs are very critical tasks for cassandra clusters want some pointers > on the following: > >

Re: Repair Issues

2019-10-26 Thread Ben Mills
Thanks Ghiyasi. On Sat, Oct 26, 2019 at 9:17 AM Hossein Ghiyasi Mehr wrote: > If the problem exist still, and all nodes are up, reboot them one by one. > Then try to repair one node. After that repair other nodes one by one. > > On Fri, Oct 25, 2019 at 12:56 AM Ben Mills wrote: > >> >> Thanks J

Re: Repair Issues

2019-10-26 Thread Hossein Ghiyasi Mehr
If the problem exist still, and all nodes are up, reboot them one by one. Then try to repair one node. After that repair other nodes one by one. On Fri, Oct 25, 2019 at 12:56 AM Ben Mills wrote: > > Thanks Jon! > > This is very helpful - allow me to follow-up and ask a question. > > (1) Yes, inc

Re: Repair Issues

2019-10-24 Thread Ben Mills
Thanks Jon! This is very helpful - allow me to follow-up and ask a question. (1) Yes, incremental repairs will never be used (unless it becomes viable in Cassandra 4.x someday). (2) I hear you on the JVM - will look into that. (3) Been looking at Cassandra version 3.11.x though was unaware that 3

Re: Repair Issues

2019-10-24 Thread Ben Mills
Hi Reid, Many thanks - I have seen that article though will definitely give it another read. Note that nodetool scrub has been tried (no effect) and sstablescrub cannot currently be run with the Cassandra image in use (though certainly a new image that allows the server to be stopped but keeps th

Re: Repair Issues

2019-10-24 Thread Jon Haddad
There's some major warning signs for me with your environment. 4GB heap is too low, and Cassandra 3.7 isn't something I would put into production. Your surface area for problems is massive right now. Things I'd do: 1. Never use incremental repair. Seems like you've already stopped doing them,

Re: Repair Issues

2019-10-24 Thread Ben Mills
Hi Sergio, No, not at this time. It was in use with this cluster previously, and while there were no reaper-specific issues, it was removed to help simplify investigation of the underlying repair issues I've described. Thanks. On Thu, Oct 24, 2019 at 4:21 PM Sergio wrote: > Are you using Cass

Re: Repair Issues

2019-10-24 Thread Reid Pinchback
Ben, you may find this helpful: https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/ From: Ben Mills Reply-To: "user@cassandra.apache.org" Date: Thursday, October 24, 2019 at 3:31 PM To: "user@cassandra.apache.org" Subject: Repair Issues Message from External Sender Greeting

Re: Repair Issues

2019-10-24 Thread Sergio
Are you using Cassandra reaper? On Thu, Oct 24, 2019, 12:31 PM Ben Mills wrote: > Greetings, > > Inherited a small Cassandra cluster with some repair issues and need some > advice on recommended next steps. Apologies in advance for a long email. > > Issue: > > Intermittent repair failures on two

Re: Repair failed and crash the node, how to bring it back?

2019-08-01 Thread Martin Xue
ested with lots of > tombstones which in turn also tax on heap consumption. My $.002 cents for > the moment. > > > > > > > > *From:* Martin Xue [mailto:martin...@gmail.com] > *Sent:* Wednesday, July 31, 2019 5:05 PM > *To:* user@cassandra.apache.org > *Subject:*

Re: Repair failed and crash the node, how to bring it back?

2019-08-01 Thread Martin Xue
Hi Alex, Thanks, much appreciated. Regards Martin On Thu, Aug 1, 2019 at 3:34 PM Alexander Dejanovski wrote: > Hi Martin, > > apparently this is the bug you've been hit by on hints : > https://issues.apache.org/jira/browse/CASSANDRA-14080 > It was fixed in 3.0.17. > > You didn't provide the l

RE: Repair failed and crash the node, how to bring it back?

2019-08-01 Thread ZAIDI, ASAD A
:05 PM To: user@cassandra.apache.org Subject: Re: Repair failed and crash the node, how to bring it back? Hi Alex, Thanks for your reply. The disk space was around 80%. The crash happened during repair, primary range full repair on 1TB keyspace. Would that crash again? Thanks Regards Martin On T

Re: Repair failed and crash the node, how to bring it back?

2019-07-31 Thread Alexander Dejanovski
Hi Martin, apparently this is the bug you've been hit by on hints : https://issues.apache.org/jira/browse/CASSANDRA-14080 It was fixed in 3.0.17. You didn't provide the logs from Cassandra at the time of the crash, only the output of nodetool, so it's hard to say what caused it. You may be hit by

Re: Repair failed and crash the node, how to bring it back?

2019-07-31 Thread Martin Xue
Hi Alex, Thanks for your reply. The disk space was around 80%. The crash happened during repair, primary range full repair on 1TB keyspace. Would that crash again? Thanks Regards Martin On Thu., 1 Aug. 2019, 12:04 am Alexander Dejanovski, wrote: > It looks like you have a corrupted hint file.

Re: Repair failed and crash the node, how to bring it back?

2019-07-31 Thread Alexander Dejanovski
It looks like you have a corrupted hint file. Did the node run out of disk space while repair was running? You might want to move the hint files off their current directory and try to restart the node again. Since you'll have lost mutations then, you'll need... to run repair ¯\_(ツ)_/¯ ---

Re: Repair / compaction for 6 nodes, 2 DC cluster

2019-07-31 Thread Alexander Dejanovski
Hi Martin, you can stop the anticompaction by roll restarting the nodes (not sure if "nodetool stop COMPACTION" will actually stop anticompaction, I never tried). Note that this will leave your cluster with SSTables marked as repaired and others that are not. These two types of SSTables will neve

Re: Repair / compaction for 6 nodes, 2 DC cluster

2019-07-31 Thread Martin Xue
Sorry ASAD, don't have chance, still bogged down with the production issue... On Wed, Jul 31, 2019 at 10:56 PM ZAIDI, ASAD A wrote: > Did you get chance to look at tlp reaper tool i.e. > http://cassandra-reaper.io/ > > It is pretty awesome – Thanks to TLP team. > > > > > > > > *From:* Martin Xue

Re: Repair / compaction for 6 nodes, 2 DC cluster

2019-07-31 Thread Martin Xue
Thanks Alex, In this case, as I have already run the repair and anti-compaction have started (including in other nodes). I don't know how long they will finish (anti-compaction). is there a way to check? nodetool compactionstats shows one process finished, then there is another one coming up. Sha

RE: Repair / compaction for 6 nodes, 2 DC cluster

2019-07-31 Thread ZAIDI, ASAD A
Did you get chance to look at tlp reaper tool i.e. http://cassandra-reaper.io/ It is pretty awesome – Thanks to TLP team. From: Martin Xue [mailto:martin...@gmail.com] Sent: Wednesday, July 31, 2019 12:09 AM To: user@cassandra.apache.org Subject: Repair / compaction for 6 nodes, 2 DC cluster He

Re: Repair / compaction for 6 nodes, 2 DC cluster

2019-07-31 Thread Oleksandr Shulgin
On Wed, Jul 31, 2019 at 7:10 AM Martin Xue wrote: > Hello, > > Good day. This is Martin. > > Can someone help me with the following query regarding Cassandra repair > and compaction? > Martin, This blog post from The Last Pickle provides an in-depth explanation as well as some practical advice:

RE: Repair daily refreshed table

2018-08-20 Thread Per Otterström
batches: https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableGc_grace_seconds /pelle From: Maxim Parkachov Sent: den 20 augusti 2018 08:29 To: user@cassandra.apache.org Subject: Re: Repair daily refreshed table Hi Raul, I cannot afford delete and then load

Re: Repair daily refreshed table

2018-08-19 Thread Maxim Parkachov
Hi Raul, I cannot afford delete and then load as this will create downtime for the record, that's why I'm upserting with TTL today()+7days as I mentioted in my original question. And at the moment I don't have an issue either with loading nor with access times. My question is should I repair such

Re: Repair daily refreshed table

2018-08-18 Thread Rahul Singh
If you wanted to be certain that all replicas were acknowledging receipt of the data, then you could use ALL or EACH_QUORUM ( if you have multiple DCs) but you must really want high consistency if you do that. You should avoid consciously creating tombstones if possible — it ends up making read

Re: Repair daily refreshed table

2018-08-18 Thread Maxim Parkachov
Hi Rahul, I'm already using LOCAL_QUORUM in batch process and it runs every day. As far as I understand, because I'm overwriting whole table with new TTL, process creates tons of thumbstones and I'm more concerned with them. Regards, Maxim. On Sun, Aug 19, 2018 at 3:02 AM Rahul Singh wrote: >

Re: Repair daily refreshed table

2018-08-18 Thread Rahul Singh
Are you loading using a batch process? What’s the frequency of the data Ingest and does it have to very fast. If not too frequent and can be a little slower, you may consider a higher consistency to ensure data is on replicas. Rahul On Aug 18, 2018, 2:29 AM -0700, Maxim Parkachov , wrote: > Hi c

Re: Repair slow, "Percent repaired" never updated

2018-06-06 Thread Martin Mačura
P.S.: Here's a corresponding log from the second node: INFO [AntiEntropyStage:1] 2018-06-04 13:37:16,409 Validator.java:281 - [repair #afc2ef90-67c0-11e8-b07c-c365701888e8] Sending completed merkle tree to /14.0.53.234 for asm_log.event INFO [StreamReceiveTask:30] 2018-06-04 14:14:28,989 StreamR

Re: repair in C* 3.11.2 and anticompactions

2018-05-24 Thread Nitan Kainth
Jean, Have you considered nodetool repair -pr (primary range) OR reaper? With Reaper you can throttle repair load on system. These two uses ranges anyway, so you may not run into anti-compaction. Regards, Nitan K. Cassandra and Oracle Architect/SME Datastax Certified Cassandra expert Oracle 10g

Re: repair in C* 3.11.2 and anticompactions

2018-05-24 Thread Jean Carlo
Thanks alain and Lerh, It is clear now. In order to avoid problems and charge in the cluster doing anticompactions, I am going to use repair by sub ranges. I studied this tool called cassandra-list-subranges it seems it still works for

Re: repair in C* 3.11.2 and anticompactions

2018-05-24 Thread Alain RODRIGUEZ
Hi Jean, Here is what Alexander wrote about it, a few months ago, in the comments of the article mentioned above: "A full repair is an incremental one that doesn't skip repaired data. > Performing anticompaction in that case too (regardless it is a valid > approach or not) allows to mark as repai

Re: repair in C* 3.11.2 and anticompactions

2018-05-23 Thread Lerh Chuan Low
Hey Jean, I think it still does anticompaction by default regardless, it will not do so only if you do subrange repair. TLP wrote a pretty good article on that: http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html On 24 May 2018 at 00:42, Jean Carlo wrote: > Hello > >

Re: Repair of 5GB data vs. disk throughput does not make sense

2018-04-26 Thread horschi
Hi Thomas, I don't think I have seen compaction ever being faster. For me, tables with small values usually are around 5 MB/s with a single compaction. With larger blobs (few KB per blob) I have seen 16MB/s. Both with "nodetool setcompactionthroughput 0". I don't think its disk related either. I

Re: Repair of 5GB data vs. disk throughput does not make sense

2018-04-26 Thread Jonathan Haddad
I can't say for sure, because I haven't measured it, but I've seen a combination of readahead + large chunk size with compression cause serious issues with read amplification, although I'm not sure if or how it would apply here. Likely depends on the size of your partitions and the fragmentation o

Re: Repair with –pr stuck in between on Cassandra 3.11.1

2018-01-25 Thread shini gupta
Hi Paulo, No we are not using JBOD.Just a bunch of disks. Thanks On Thu, 25 Jan 2018 at 5:44 PM, Paulo Motta wrote: > Are you using JBOD? A thread dump (jstack ) on the affected nodes > would probably help troubleshoot this. > > 2018-01-25 6:45 GMT-02:00 shini gupta : > > Hi, > > > > > > We have

Re: Repair with –pr stuck in between on Cassandra 3.11.1

2018-01-25 Thread Paulo Motta
Are you using JBOD? A thread dump (jstack ) on the affected nodes would probably help troubleshoot this. 2018-01-25 6:45 GMT-02:00 shini gupta : > Hi, > > > We have upgraded the system from Cassandra 2.1.16 to 3.11.1. After about > 335M of data loading, repair with –pr and –full option was trigge

Re: Repair giving error

2018-01-22 Thread Alain RODRIGUEZ
Hello, Some other thoughts: - Are you using internode secured communications (and then use the port 7001 instead) ? - A rolling restart might help, have you tried restarting a few / all the nodes? This issue is very weird and I am only making poor guesses here. This is not an issue I have seen i

Re: Repair giving error

2018-01-18 Thread Akshit Jain
Hi alain Thanks for the response. I'm using cassandra 3.10 nodetool status shows all the nodes up No schema disaggrement port 7000 is open Regards Akshit Jain 9891724697 On Thu, Jan 18, 2018 at 4:53 PM, Alain RODRIGUEZ wrote: > Hello, > > I looks like a communication issue. > > What Cassandra

Re: Repair giving error

2018-01-18 Thread Alain RODRIGUEZ
Hello, I looks like a communication issue. What Cassandra version are you using? What's the result of 'nodetool status '? Any schema disagreement 'nodetool describecluster'? Is the port 7000 opened and the nodes communicating with each other?(Ping is not proving connection is up, even though it i

Re: Repair fails for unknown reason

2018-01-09 Thread kurt greaves
The parent repair session will be on the node that you kicked off the repair on. Are the logs above from that node? Can you make it a bit clearer how many nodes are involved and the corresponding logs from each node? On 9 January 2018 at 09:49, Hannu Kröger wrote: > We have run restarts on the c

Re: Repair fails for unknown reason

2018-01-09 Thread Hannu Kröger
We have run restarts on the cluster and that doesn’t seem to help at all. We ran repair separately for each table that seems to go through usually but running a repair on a keyspace doesn’t. Anything anyone? Hannu > On 3 Jan 2018, at 23:24, Hannu Kröger wrote: > > I can certainly try that.

Re: Repair fails for unknown reason

2018-01-03 Thread Hannu Kröger
I can certainly try that. No problem there. However wouldn’t we then get this kind of errors if that was the case: java.lang.RuntimeException: Cannot start multiple repair sessions over the same sstables ? Hannu > On 3 Jan 2018, at 20:50, Nandakishore Tokala > wrote: > > hi Hannu, > > I thi

Re: Repair fails for unknown reason

2018-01-03 Thread Nandakishore Tokala
hi Hannu, I think some of the repairs are hanging there. please restart all the nodes in the cluster and start the repair Thanks Nanda On Wed, Jan 3, 2018 at 9:35 AM, Hannu Kröger wrote: > Additional notes: > > 1) If I run the repair just on those tables, it works fine > 2) Those tables are

Re: Repair fails for unknown reason

2018-01-03 Thread Hannu Kröger
Additional notes: 1) If I run the repair just on those tables, it works fine 2) Those tables are empty Hannu > On 3 Jan 2018, at 18:23, Hannu Kröger wrote: > > Hello, > > Situation is as follows: > > Repair was started on node X on this keyspace with —full —pr. Repair fails on > node Y. >

Re: Repair failing after it was interrupted once

2017-11-15 Thread Erick Ramirez
Check that there are no running repair threads on the nodes with nodetool netstats. For those that do have running repairs, restart C* on them to kill the repair threads and you should be able to repair the nodes again. Cheers! On Wed, Nov 15, 2017 at 8:08 PM, Dipan Shah wrote: > Hello, > > > I

RE: Repair on system_auth

2017-07-07 Thread Mark Furlong
I’m currently on 2.1.12. Are you saying this bug exists on the current latest version 3.0.14? Thank you Mark 801-705-7115 office From: ­Fay Hou [Storage Service] [mailto:fay...@coupang.com] Sent: Thursday, July 6, 2017 2:24 PM To: User Subject: Re: Repair on system_auth There is a bug on

Re: Repair on system_auth

2017-07-06 Thread Hannu Kröger
You can also stop repair using JMX without restarting. There are scripts to do that. Hannu > On 6 Jul 2017, at 23.24, ­Fay Hou [Storage Service] > wrote: > > There is a bug on repair system_auth keyspace. We just skip the repair on > system_auth. Yes. it is ok to kill the running repair job

Re: Repair on system_auth

2017-07-06 Thread ­Fay Hou [Storage Service]
There is a bug on repair system_auth keyspace. We just skip the repair on system_auth. Yes. it is ok to kill the running repair job On Thu, Jul 6, 2017 at 1:14 PM, Subroto Barua wrote: > you can check the status via nodetool netstats > to kill the repair job, restart the instance > > > On Thursd

Re: Repair on system_auth

2017-07-06 Thread Subroto Barua
you can check the status via nodetool netstatsto kill the repair job, restart the instance On Thursday, July 6, 2017, 1:09:42 PM PDT, Mark Furlong wrote: I have started a repair on my system_auth keyspace. The repair has started and the process shows as running with ps but am not seeing any

Re: repair question (-dc option)

2017-05-12 Thread Gopal, Dhruva
: "Gopal, Dhruva" Cc: "user@cassandra.apache.org" Subject: Re: repair question (-dc option) If there was no node down during that period, and you are using LOCAL_QUORUM read/write, then yes above command works. On Thu, May 11, 2017 at 11:59 AM, Gopal, Dhruva mailto:dhr

Re: repair question (-dc option)

2017-05-11 Thread Varun Gupta
If there was no node down during that period, and you are using LOCAL_QUORUM read/write, then yes above command works. On Thu, May 11, 2017 at 11:59 AM, Gopal, Dhruva wrote: > Hi – > > I have a question on running a repair after bringing up a node that was > down (brought down gracefully) for

Re: repair performance

2017-03-20 Thread daemeon reiydelle
I would zero in on network throughput, especially interrack trunks sent from my mobile Daemeon Reiydelle skype daemeon.c.m.reiydelle USA 415.501.0198 On Mar 17, 2017 2:07 PM, "Roland Otta" wrote: > hello, > > we are quite inexperienced with cassandra at the moment and are playing > around with

Re: repair performance

2017-03-20 Thread Thakrar, Jayesh
. Enabling gc logging helps as well to see the impact. From: Roland Otta Date: Monday, March 20, 2017 at 1:53 AM To: Conversant , "user@cassandra.apache.org" Subject: Re: repair performance good point! i did not (so far) i will do that - especially because i often see all compacti

Re: repair performance

2017-03-19 Thread Roland Otta
ummit-2016 From: Roland Otta Date: Friday, March 17, 2017 at 5:47 PM To: "user@cassandra.apache.org" Subject: Re: repair performance did not recognize that so far. thank you for the hint. i will definitely give it a try On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote: The fork f

Re: repair performance

2017-03-18 Thread Thakrar, Jayesh
dra-summit-2016 From: Roland Otta Date: Friday, March 17, 2017 at 5:47 PM To: "user@cassandra.apache.org" Subject: Re: repair performance did not recognize that so far. thank you for the hint. i will definitely give it a try On Fri, 2017-03-17 at 22:32 +0100, benjamin roth

Re: repair performance

2017-03-17 Thread Roland Otta
did not recognize that so far. thank you for the hint. i will definitely give it a try On Fri, 2017-03-17 at 22:32 +0100, benjamin roth wrote: The fork from thelastpickle is. I'd recommend to give it a try over pure nodetool. 2017-03-17 22:30 GMT+01:00 Roland Otta mailto:roland.o...@willhaben.

Re: repair performance

2017-03-17 Thread benjamin roth
The fork from thelastpickle is. I'd recommend to give it a try over pure nodetool. 2017-03-17 22:30 GMT+01:00 Roland Otta : > forgot to mention the version we are using: > > we are using 3.0.7 - so i guess we should have incremental repairs by > default. > it also prints out incremental:true when

Re: repair performance

2017-03-17 Thread Roland Otta
... maybe i should just try increasing the job threads with --job-threads shame on me On Fri, 2017-03-17 at 21:30 +, Roland Otta wrote: forgot to mention the version we are using: we are using 3.0.7 - so i guess we should have incremental repairs by default. it also prints out incremental:tr

Re: repair performance

2017-03-17 Thread Roland Otta
forgot to mention the version we are using: we are using 3.0.7 - so i guess we should have incremental repairs by default. it also prints out incremental:true when starting a repair INFO [Thread-7281] 2017-03-17 09:40:32,059 RepairRunnable.java:125 - Starting repair command #7, repairing keyspac

Re: repair performance

2017-03-17 Thread benjamin roth
It depends a lot ... - Repairs can be very slow, yes! (And unreliable, due to timeouts, outages, whatever) - You can use incremental repairs to speed things up for regular repairs - You can use "reaper" to schedule repairs and run them sliced, automated, failsafe The time repairs actually may var

Re: repair -pr in crontab

2016-11-25 Thread Benjamin Roth
It is absolutely ok to run parallel repair -pr, if 1. the ranges do not overlap 2. if your cluster can handle the pressure - do not underestimate that. In reaper you can tweak some settings like repair intensity to give your cluster some time to breath between repair slices. 2016-11-25 11:34 GMT+

Re: repair -pr in crontab

2016-11-25 Thread Artur Siekielski
Hi, yes, I read about how the repairing works, but the docs/blog posts lack practical recommendations and "best practices". For example, I found people having issues with running "repair -pr" simultaneously on all nodes, but it isn't clear it shouldn't be allowed. In the end I implemented rol

Re: repair -pr in crontab

2016-11-24 Thread Alexander Dejanovski
Hi, we maintain a hard fork of Reaper that works with all versions of Cassandra up to 3.0.x : https://github.com/thelastpickle/cassandra-reaper Just to save you some time digging into all the forks that could exist. Cheers, On Fri, Nov 25, 2016 at 7:37 AM Benjamin Roth wrote: > I recommend usi

Re: repair -pr in crontab

2016-11-24 Thread Benjamin Roth
I recommend using cassandra-reaper Using crons without proper Monitoring will most likely not work as expected. There are some reaper forks on GitHub. You have to check which one works with your Cassandra version. The original one from Spotify only works on 2.x not on 3.x Am 25.11.2016 07:31 schr

Re: repair -pr in crontab

2016-11-24 Thread wxn...@zjqunshuo.com
Hi Artur, When I asked similar questions, someone addressed me to the below links and they are helpful. See http://www.datastax.com/dev/blog/repair-in-cassandra https://lostechies.com/ryansvihla/2015/09/25/cassandras-repair-should-be-called-required-maintenance/ https://cassandra-zone.com/underst

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-15 Thread Anuj Wadehra
Hi Leena, Do you have a firewall between the two DCs? If yes, "connection reset" can be caused by Cassandra trying to use a TCP connection which is already closed by the firewall. Please make sure that you set high connection timeout at firewall. Also, make sure your servers are not overloaded.

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-14 Thread Leena Ghatpande
ngs looks good on all nodes. From: Anuj Wadehra Sent: Wednesday, October 12, 2016 2:41 PM To: user Subject: Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr Hi Leena, First thing you should be concerned about is : Why

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-13 Thread kurt Greaves
Don't do pr repairs when using incremental repair, you'll just end up with loads of anti-compactions. On 12 October 2016 at 19:11, Harikrishnan Pillai wrote: > In my experience dc local repair node by node with > Pr and par options is best .full repair increased sstables > A lot and take days to

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Harikrishnan Pillai
In my experience dc local repair node by node with Pr and par options is best .full repair increased sstables A lot and take days to compact it back or another Easy option for repair is use a spark job ,read all data with Consistency all and increase read repair chance to 100 % or use Netflix tickl

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Anuj Wadehra
Hi Leena, First thing you should be concerned about is : Why the repair -pr operation doesnt complete ? Second comes the question : Which repair option is best? One probable cause of stuck repairs is : if the firewall between DCs is closing TCP connections and Cassandra is trying to use such c

RE: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Anubhav Kale
marked either ways). From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] Sent: Wednesday, October 12, 2016 9:25 AM To: user@cassandra.apache.org Subject: Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr Note that the tickle approach doesn’t mark sstables as

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Jeff Jirsa
tombstone removal. From: Anubhav Kale Reply-To: "user@cassandra.apache.org" Date: Wednesday, October 12, 2016 at 9:17 AM To: "user@cassandra.apache.org" Subject: RE: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr The default re

RE: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Anubhav Kale
The default repair process doesn't usually work at scale, unfortunately. Depending on your data size, you have the following options. Netflix Tickler: https://github.com/ckalantzis/cassTickler (Read at CL.ALL via CQL continuously :: Python) Spotify Reaper: https://github.com/spotify/cassandra-

Re: Repair and LCS tables

2016-07-21 Thread Cyril Scetbon
> What Cassandra version are you using? 2.1.14 > > There a lot of recent and ongoing work around LCS. Maybe one of these tickets > will be of interest to you: > > https://issues.apache.org/jira/browse/CASSANDRA-10979 > > https://issues.a

Re: Repair and LCS tables

2016-07-20 Thread Alain RODRIGUEZ
Hi Cyril, What Cassandra version are you using? There a lot of recent and ongoing work around LCS. Maybe one of these tickets will be of interest to you: https://issues.apache.org/jira/browse/CASSANDRA-10979 https://issues.apache.org/jira/browse/CASSANDRA-10862 C*heers, ---

Re: Repair schedules for new clusters

2016-05-17 Thread Ben Slater
We’ve found with incremental repairs that more frequent repairs are generally better. Our current standard for incremental repairs is once per day. I imagine that the exact optimum frequency is dependant on the ratio of reads to write in your cluster. Turning on incremental repairs from the get-go

Re: Repair with "-pr" and vnodes

2016-01-12 Thread Robert Coli
On Tue, Jan 12, 2016 at 3:46 PM, Roman Tkachenko wrote: > The documentation for the "-pr" repair option says it repairs only the > first range returned by the partitioner. However, with vnodes a node owns a > lot of small ranges. > > Does that mean that if I run rolling "nodetool repair -pr" on t

Re: Repair Hangs while requesting Merkle Trees

2015-11-29 Thread Anuj Wadehra
ress=listen address=PUBLIC IP address. In seeds, we put PUBLIC IP of other nodes but private IP for the local node. There were some issues if we tried to access local node via its public IP. Thanks Anuj On Tue, 24/11/15, Paulo Motta wrote: Su

Re: Repair Hangs while requesting Merkle Trees

2015-11-29 Thread Anuj Wadehra
its public IP. Thanks Anuj On Tue, 24/11/15, Paulo Motta wrote: Subject: Re: Repair Hangs while requesting Merkle Trees To: "user@cassandra.apache.org" , "Anuj Wadehra" Date: Tuesday, 24 November, 2015, 12:38 AM The issue might be related to the ESTABLISH

Re: Repair Hangs while requesting Merkle Trees

2015-11-29 Thread Anuj Wadehra
via its public IP. Thanks Anuj On Tue, 24/11/15, Paulo Motta wrote: Subject: Re: Repair Hangs while requesting Merkle Trees To: "user@cassandra.apache.org" , "Anuj Wadehra" Date: Tuesday, 24 November, 2015, 12:38 AM The is

Re: Repair Hangs while requesting Merkle Trees

2015-11-23 Thread Paulo Motta
k team to capture netstats and tcpdump > too.. > > Thanks > Anuj > > > ---- > On Wed, 18/11/15, Anuj Wadehra wrote: > > Subject: Re: Repair Hangs while requesting Merkle Trees > To: "user@cassandra.apache.org"

Re: Repair Hangs while requesting Merkle Trees

2015-11-23 Thread Anuj Wadehra
triggered for cross DC nodes..and hints replay being timed-out..Is that an indication of a network issue? I am getting in tough with network team to capture netstats and tcpdump too.. Thanks Anuj On Wed, 18/11/15, Anuj Wadehra wrote: Subject: Re

Re: Repair Hangs while requesting Merkle Trees

2015-11-17 Thread Anuj Wadehra
Thanks Bryan !! Connection is in ESTBLISHED state on on end and completely missing at other end (in another dc). Yes, we can revisit TCP tuning.But the problem is node specific. So not sure whether tuning is the culprit. Thanks Anuj Sent from Yahoo Mail on Android From:"Bryan Cheng" Date

Re: Repair Hangs while requesting Merkle Trees

2015-11-17 Thread Bryan Cheng
s > Anuj > > > > > > > > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > -- > *From*:"Bryan Cheng" > *Date*:Tue, 17 Nov, 2015 at 5:54 am > > *Subject*:Re: Repair Hangs while requesting Merkle Tr

Re: Repair Hangs while requesting Merkle Trees

2015-11-16 Thread Anuj Wadehra
Hi Bryan, Thanks for the reply !! I didnt mean streaming_socket_tomeout_in_ms. I meant when you run netstats (Linux cmnd) on  node A in DC1, you will notice that there is connection in established state with node B in DC2. But when you run netstats on node B, you wont find any connection with

Re: Repair Hangs while requesting Merkle Trees

2015-11-16 Thread Bryan Cheng
gt; Anuj > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > -- > *From*:"Anuj Wadehra" > *Date*:Sat, 14 Nov, 2015 at 11:59 pm > > *Subject*:Re: Repair Hangs while requesting Merkle Trees

Re: Repair time comparison for Cassandra 2.1.11

2015-11-15 Thread Anuj Wadehra
For the error, you can see  http://www.scriptscoop.net/t/3bac9a3307ac/cassandra-lost-notification-from-nodetool-repair.html Lost notification should not be a problem.please see  https://issues.apache.org/jira/browse/CASSANDRA-7909 Infact, we are also currently facing an issue where merkle tr

Re: Repair time comparison for Cassandra 2.1.11

2015-11-15 Thread Badrjan
ages in logs while repair is > running? > > 50 hours seems too much considering your cluster is stable and you dont > have any dropped mutations on any of the nodes. > > > Thanks> Anuj > > > Sent from Yahoo Mail on Android > > > From> :"Badrjan&

Re: Repair time comparison for Cassandra 2.1.11

2015-11-15 Thread Anuj Wadehra
Ok. I dont have much experience with 2.1 as we are on 2.0.x. Are you using sequential repair? If yes, parallel repair can be faster but you need to make sure that your application has sufficient room to run when cluster is running repair. Are you observing any WARN or ERROR messages in logs wh

Re: Repair time comparison for Cassandra 2.1.11

2015-11-15 Thread Badrjan
Nothing is being dropped plus the processor is busy around 60%.  B. 15. Nov 2015 15:58 by anujw_2...@yahoo.co.in: > Repair can take long time if you have lota of inconaistent data. If you > havent restarted nodes yet, you can  run nodetool tpstats command on all > nodes to make sure that there

Re: Repair time comparison for Cassandra 2.1.11

2015-11-15 Thread Anuj Wadehra
Repair can take long time if you have lota of inconaistent data. If you havent restarted nodes yet, you can  run nodetool tpstats command on all nodes to make sure that there no mutation drops. Thanks Anuj Sent from Yahoo Mail on Android From:"badr...@tuta.io" Date:Sun, 15 Nov, 2015 at 4:20

Re: Repair Hangs while requesting Merkle Trees

2015-11-14 Thread Anuj Wadehra
One more observation.We observed that there are few TCP connections which node shows as Established but when we go to node at other end,connection is not there. They are called "phantom" connections I guess. Can this be a possible cause? Thanks Anuj Sent from Yahoo Mail on Android From:"An

  1   2   3   4   >