Re: Repair Hangs while requesting Merkle Trees

2015-11-29 Thread Anuj Wadehra
ress=listen address=PUBLIC IP address. In seeds, we put PUBLIC IP of other nodes but private IP for the local node. There were some issues if we tried to access local node via its public IP. Thanks Anuj On Tue, 24/11/15, Paulo Motta wrote: Su

Re: Repair Hangs while requesting Merkle Trees

2015-11-29 Thread Anuj Wadehra
its public IP. Thanks Anuj On Tue, 24/11/15, Paulo Motta wrote: Subject: Re: Repair Hangs while requesting Merkle Trees To: "user@cassandra.apache.org" , "Anuj Wadehra" Date: Tuesday, 24 November, 2015, 12:38 AM The issue might be related to the ESTABLISH

Re: Repair Hangs while requesting Merkle Trees

2015-11-29 Thread Anuj Wadehra
via its public IP. Thanks Anuj On Tue, 24/11/15, Paulo Motta wrote: Subject: Re: Repair Hangs while requesting Merkle Trees To: "user@cassandra.apache.org" , "Anuj Wadehra" Date: Tuesday, 24 November, 2015, 12:38 AM The is

Re: Repair Hangs while requesting Merkle Trees

2015-11-23 Thread Paulo Motta
k team to capture netstats and tcpdump > too.. > > Thanks > Anuj > > > ---- > On Wed, 18/11/15, Anuj Wadehra wrote: > > Subject: Re: Repair Hangs while requesting Merkle Trees > To: "user@cassandra.apache.org"

Re: Repair Hangs while requesting Merkle Trees

2015-11-23 Thread Anuj Wadehra
: Repair Hangs while requesting Merkle Trees To: "user@cassandra.apache.org" Date: Wednesday, 18 November, 2015, 7:57 AM Thanks Bryan !! Connection is in ESTBLISHED state on on end and completely missing at other end (in another dc). Yes, we can revisit TCP tuning.But the probl

Re: Repair Hangs while requesting Merkle Trees

2015-11-17 Thread Anuj Wadehra
Thanks Bryan !! Connection is in ESTBLISHED state on on end and completely missing at other end (in another dc). Yes, we can revisit TCP tuning.But the problem is node specific. So not sure whether tuning is the culprit. Thanks Anuj Sent from Yahoo Mail on Android From:"Bryan Cheng" Date

Re: Repair Hangs while requesting Merkle Trees

2015-11-17 Thread Bryan Cheng
s > Anuj > > > > > > > > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > -- > *From*:"Bryan Cheng" > *Date*:Tue, 17 Nov, 2015 at 5:54 am > > *Subject*:Re: Repair Hangs while requesting Merkle Tr

Re: Repair Hangs while requesting Merkle Trees

2015-11-16 Thread Anuj Wadehra
Hi Bryan, Thanks for the reply !! I didnt mean streaming_socket_tomeout_in_ms. I meant when you run netstats (Linux cmnd) on  node A in DC1, you will notice that there is connection in established state with node B in DC2. But when you run netstats on node B, you wont find any connection with

Re: Repair Hangs while requesting Merkle Trees

2015-11-16 Thread Bryan Cheng
gt; Anuj > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > -- > *From*:"Anuj Wadehra" > *Date*:Sat, 14 Nov, 2015 at 11:59 pm > > *Subject*:Re: Repair Hangs while requesting Merkle Trees

Re: Repair Hangs while requesting Merkle Trees

2015-11-14 Thread Anuj Wadehra
One more observation.We observed that there are few TCP connections which node shows as Established but when we go to node at other end,connection is not there. They are called "phantom" connections I guess. Can this be a possible cause? Thanks Anuj Sent from Yahoo Mail on Android From:"An

Re: Repair Hangs while requesting Merkle Trees

2015-11-14 Thread Anuj Wadehra
Thanks Daemeon !! I wil capture the output of netstats and share in next few days. We were thinking of taking tcp dumps also. If its a network issue and increasing request timeout worked, not sure how Cassandra is dropping messages based on timeout.Repair messages are non droppable and not sup

Re: Repair Hangs while requesting Merkle Trees

2015-11-11 Thread daemeon reiydelle
Have you checked the network statistics on that machine? (netstats -tas) while attempting to repair ... if netstats show ANY issues you have a problem. If you can put the command in a loop running every 60 seconds for maybe 15 minutes and post back? Out of curiousity, how many remote DC nodes are

Re: Repair Hangs while requesting Merkle Trees

2015-11-11 Thread Anuj Wadehra
Hi, we are using 2.0.14. We have 2 DCs at remote locations with 10GBps connectivity.We are able to complete repair (-par -pr) on 5 nodes. On only one node in DC2, we are unable to complete repair as it always hangs. Node sends Merkle Tree requests, but one or more nodes in DC1 (remote) never sho

Re: Repair hangs, seems to be stuck somehow

2014-10-21 Thread Alain RODRIGUEZ
I finally had to decommission this annoying node that was breaking repairs again. So far so good. It seems I solved the issue doing so. Hope this will help some people out there. Alain 2014-10-20 22:59 GMT+02:00 Alain RODRIGUEZ : > Hi guys. > > It seems that there were 2 streams hanging to one

Re: Repair hangs, seems to be stuck somehow

2014-10-20 Thread Alain RODRIGUEZ
Hi guys. It seems that there were 2 streams hanging to one node, restarting this targeted node seems to have solved my issue, repairs are now running. Waiting to see if it completes. "Try repairing only one CF at a time, starting with the smallest ones and/or the ones whose data you care about th

Re: Repair hangs, seems to be stuck somehow

2014-10-20 Thread Robert Coli
On Mon, Oct 20, 2014 at 5:45 AM, Alain RODRIGUEZ wrote: > I now that 2.1 fixes this all. We are going to migrate to C* 2.0 soon > (asap) and then to 2.1, but we first need to run some tests, which will > take us some time. Is repair officially broken on 1.2.18 ? Is there any > known workaround or

Re: Repair hangs, seems to be stuck somehow

2014-10-20 Thread Robert Coli
On Mon, Oct 20, 2014 at 5:45 AM, Alain RODRIGUEZ wrote: > Using Cassandra 1.2.18, we are experimenting an issue in our 2 DC > (EC2MultiRegionSnitch) C*1.2.18 cluster. > > We have 2 DC and I saw some weird* inconsistencies between our 2 DC. I > tried to run repair on all the nodes of all 2 DC (We

Re: Repair hangs - Cassandra 1.2.10

2013-12-09 Thread Aaron Morton
> I changed logging to debug level, but still nothing is logged. > Again - any help will be appreciated. There is nothing at the ERROR level on any machine ? check nodetool compactionstats to see if a validation compaction is running, the repair may be waiting on this. check nodetool netstats

Re: Repair hangs - Cassandra 1.2.10

2013-12-04 Thread Tamar Rosen
Update - I am still experiencing the above issues, but not all the time. I was able to run repair (on this keyspace) from node 2 and from node 4, but now a different keyspace hangs on these nodes, and I am still not able to run repair on node 1. It seems random. I changed logging to debug level, bu

Re: Repair hangs when merkle tree request is not acknowledged

2013-04-06 Thread aaron morton
> If I wait 24 hours, the repair command will return an error saying that the > node died… but the node really didn't die, I watch it the whole time. Can you include the error, it makes it easier to know what's going on. You should see INFO messages on the node you are running repair on that say

Re: Repair hangs when merkle tree request is not acknowledged

2013-04-05 Thread Paul Sudol
> How does it fail? If I wait 24 hours, the repair command will return an error saying that the node died… but the node really didn't die, I watch it the whole time. I have the DEBUG messages on in the log files, when the node I'm repairing sends out a merkle tree request, I will normally see, {C

Re: Repair hangs when merkle tree request is not acknowledged

2013-04-05 Thread aaron morton
> A repair on a certain CF will fail, and I run it again and again, eventually > it will succeed. How does it fail? Can you see the repair start on the other node ? If you are getting errors in the log about streaming failing because a node died, and the FailureDetector is in the call stack, ch

Re: Repair hangs after Upgrade to VNodes & 1.2.2

2013-03-27 Thread Ryan Lowe
Upgrading to 1.2.3 fixed the -pr Repair.. I'll just use that from now on (which is what I prefer!) Thanks, Ryan On Wed, Mar 27, 2013 at 9:11 AM, Ryan Lowe wrote: > Marco, > > No there are no errors... the last line I see in my logs related to repair > is : > > [repair #...] Sending completed m

Re: Repair hangs after Upgrade to VNodes & 1.2.2

2013-03-27 Thread Ryan Lowe
Marco, No there are no errors... the last line I see in my logs related to repair is : [repair #...] Sending completed merkle tree to /[node] for (keyspace1,columnfamily1) Ryan On Wed, Mar 27, 2013 at 8:49 AM, Marco Matarazzo < marco.matara...@hexkeep.com> wrote: > > If I run `nodetool -h lo

Re: Repair hangs after Upgrade to VNodes & 1.2.2

2013-03-27 Thread Marco Matarazzo
> If I run `nodetool -h localhost repair`, then it will repair only the first > Keyspace and then hang... I let it go for a week and nothing. Does node logs show any error ? > If I run `nodetool -h localhost repair -pr`, then it appears to only repair > the first VNode range, but does do all ke

Re: repair hangs

2013-03-18 Thread aaron morton
> /raid0/cassandra/data/OpsCenter/events_timeline/OpsCenter-events_timeline-hf-1-Data.db > is not compatible with current version ib > -- This can be fixed with a nodetool upgradesstables Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.the

Re: repair hangs

2013-03-14 Thread Dane Miller
On Thu, Mar 14, 2013 at 6:34 AM, aaron morton wrote: >> 1. is this a nodetool bug? is there any way to propagate the >> java.io.IOException back to nodetool? > The repair continues to work even if nodetool fails, it's a server side thing. > >> 2. network problems on EC2, I'm shocked! are there r

Re: repair hangs

2013-03-14 Thread aaron morton
> 1. is this a nodetool bug? is there any way to propagate the > java.io.IOException back to nodetool? The repair continues to work even if nodetool fails, it's a server side thing. > 2. network problems on EC2, I'm shocked! are there recommended > network settings for EC2? Streaming does not p

Re: repair hangs

2013-03-13 Thread Dane Miller
On Wed, Mar 13, 2013 at 12:39 PM, Wei Zhu wrote: > My guess would be there is some exception during the repair and your session > is aborted. > Here is the code of doing repair: > >https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/AntiEntropyService.java > > loo

Re: repair hangs

2013-03-13 Thread Wei Zhu
should give you a rough idea in which stage repaired died. -Wei - Original Message - From: "Dane Miller" To: user@cassandra.apache.org, "Wei Zhu" Sent: Wednesday, March 13, 2013 12:32:20 PM Subject: Re: repair hangs On Wed, Mar 13, 2013 at 11:44 AM, Wei Zhu wrote:

Re: repair hangs

2013-03-13 Thread Dane Miller
On Wed, Mar 13, 2013 at 11:44 AM, Wei Zhu wrote: >Do you see anything related to "merkle" tree in your log? > >Also do a nodetool compactionstats, during merkle tree calculation, you will >see >validation there. The last mention of "merkle" is 2 days old. compactionstats are: $ nodetool compac

Re: repair hangs

2013-03-13 Thread Wei Zhu
Do you see anything related to "merkle" tree in your log? Also do a nodetool compactionstats, during merkle tree calculation, you will see validation there. -Wei - Original Message - From: "Dane Miller" To: user@cassandra.apache.org Sent: Wednesday, March 13, 2013 10:54:50 AM Subject: