Re: Never ending manual repair after adding second DC

2012-07-16 Thread Bill Au
I had ran into the same problem before: http://comments.gmane.org/gmane.comp.db.cassandra.user/25334 I have not fond any solutions yet. Bill On Mon, Jul 16, 2012 at 11:10 AM, Bart Swedrowski wrote: > > > On 16 July 2012 11:25, aaron morton wrote: > >> In the before time someone had problems

removing second data center from live cluster

2012-05-10 Thread Bill Au
My cluster is currently running with 2 data centers, dc1 and dc2. I would like to remove dc2 and all its nodes completely. I am using local quorum for read and right. I figure that I need to change the replication factor to {dc1:3, dc2:0} before running nodetool decommission on each node in dc2.

Re: how to re-distribute replicas after changing rack assignment

2012-05-10 Thread Bill Au
licas if they have not been seen before. (Does not > check rack again.) > > You should be able to move one node at a time and run repair. Also ensure > reads are at QUOURM. > > hope that helps. > > - > Aaron Morton > Freelance Developer > @aaronmorton

Re: getting status of long running repair

2012-05-09 Thread Bill Au
you seeing a > lot of dropped Mutations/Messages? Are the nodes going up and down all the > time while the repair is running? > > Regards, > > > > > > On Tue, May 8, 2012 at 2:05 PM, Bill Au wrote: > >> There are no error message in my log. >> >

Re: getting status of long running repair

2012-05-08 Thread Bill Au
ucts/opscenter > > Cheers > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 8/05/2012, at 2:15 PM, Ben Coverston wrote: > > Check the log files for warnings or errors. They may indicate why your >

Re: getting status of long running repair

2012-05-07 Thread Bill Au
I restarted the nodes and then restarted the repair. It is still hanging like before. Do I keep repeating until the repair actually finish? Bill On Fri, May 4, 2012 at 2:18 PM, Rob Coli wrote: > On Fri, May 4, 2012 at 10:30 AM, Bill Au wrote: > > I know repair may take a long ti

getting status of long running repair

2012-05-04 Thread Bill Au
I know repair may take a long time to run. I am running repair on a node with about 15 GB of data and it is taking more than 24 hours. Is that normal? Is there any way to get status of the repair? tpstats does show 2 active and 2 pending AntiEntropySessions. But netstats and compactionstats sh

Re: nodetool repair hanging

2012-04-26 Thread Bill Au
s the repair. When we need to stop a repair we have > bounced all of the participating nodes. I've been told that there is no > harm in stopping repairs. > > On Apr 24, 2012, at 2:55 PM, Bill Au wrote: > > > I am running 1.0.8. I am adding a new data center to an exist

Re: Adding a second datacenter

2012-04-24 Thread Bill Au
I just followed the step outlined in this email thread to add a second data center to my existing cluster. I am running 1.0.8. Each data center has a replication factor of 2. I am using local quorum for read and write. Everything went smoothly until I ran the last step, which is to run nodetool

nodetool repair hanging

2012-04-24 Thread Bill Au
I am running 1.0.8. I am adding a new data center to an existing cluster. Following steps outlined in another thread on the mailing list, things went fine except for the last step, which is to run repair on all the nodes in the new data center. Repair seems to be hanging indefinitely. There is n

Re: default required in cassandra-topology.properties?

2012-04-19 Thread Bill Au
ful use of replication factor and NetworkTopologyStrategy can help > with this, but you should make sure that a node really doesn’t need to > contact the unknown nodes before marking them as such. > > ** ** > > ** ** > > Richard > > ** ** > > ** ** >

default required in cassandra-topology.properties?

2012-04-19 Thread Bill Au
All the examples of cassandra-topology.properties that I have seen have a default entry assigning unknown nodes to a specific data center and rack. Is it possible to have Cassandra ignore unknown nodes for the purpose of replication? Bill

Re: 1.0.2 - nodetool ring and info reports wrong load after compact

2012-02-23 Thread Bill Au
Thanks for the info. Upgrade within the 1.0.x branch is simply a rolling restart, right? Bill On Thu, Feb 16, 2012 at 9:20 PM, Jonathan Ellis wrote: > CASSANDRA-3496, fixed in 1.0.4+ > > On Thu, Feb 16, 2012 at 8:27 AM, Bill Au wrote: > > I am running 1.0.2 with the default ti

Re: 1.0.2 - nodetool ring and info reports wrong load after compact

2012-02-16 Thread Bill Au
orton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 17/02/2012, at 3:27 AM, Bill Au wrote: > > I am running 1.0.2 with the default tiered compaction. After running a > "nodetool compact", I noticed that on about half of the machines in my > c

1.0.2 - nodetool ring and info reports wrong load after compact

2012-02-16 Thread Bill Au
I am running 1.0.2 with the default tiered compaction. After running a "nodetool compact", I noticed that on about half of the machines in my cluster, both "nodetool ring" and "nodetool info" report that the load is actually higher than before when I expect it to be lower. It is almost twice as m

Re: Cassandra crashed - possible JMX threads leak

2010-10-26 Thread Bill Au
22, 2010 at 4:33 PM, Jonathan Ellis wrote: > Is the fix as simple as calling close() then? Can you submit a patch for > that? > > On Fri, Oct 22, 2010 at 2:49 PM, Bill Au wrote: > > Not with the nodeprobe or nodetool command because the JVM these two > > commands spawn has

Re: Cassandra crashed - possible JMX threads leak

2010-10-22 Thread Bill Au
d and write requests. I am guessing the hinted hand off might have something to do with it. I am still trying to understand what is happening there. Bill On Wed, Oct 20, 2010 at 5:16 PM, Jonathan Ellis wrote: > can you reproduce this by, say, running nodeprobe ring in a bash while > loop

Cassandra crashed - possible JMX threads leak

2010-10-20 Thread Bill Au
One of my Cassandra server crashed with the following: ERROR [ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn] 2010-10-19 00:25:10,419 CassandraDaemon.java (line 82) Uncaught exception in thread Thread[ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn,5,main] java.lang.OutOfMemoryError: unable to create new native thread

Re: Question on load balancing in a cluster

2010-08-06 Thread Bill Au
If nodetool loadbalance does not do what it's name implies, should it be renamed or maybe even remove altogether since the recommendation is to _never_ use it in production? Bill On Thu, Aug 5, 2010 at 6:41 AM, aaron morton wrote: > This comment from Ben Black may help... > > "I recommend you _n

Re: question about deleting from cassandra

2010-03-18 Thread Bill Au
e their data > being deleted without a very good reason. "We didn't have enough > room" is not a very good reason. :) > > On Wed, Mar 17, 2010 at 9:03 PM, Bill Au wrote: > > I would assume that Facebook and Twitter are not keep all the data that > they > >

Re: question about deleting from cassandra

2010-03-17 Thread Bill Au
:27 AM, Sylvain Lebresne >> > wrote: >> >> >> >> I guess you can also vote for this ticket : >> >> https://issues.apache.org/jira/browse/CASSANDRA-699 :) >> >> >> >> >> >> >> >> -- >> >> Sylvain