Re: replace dead node? " token -1 "

2012-08-14 Thread Jim Cistaro
replace_token, but be aware of these possible inconveniences. Jim Cistaro Netflix Cassandra Operations From: Yang mailto:tedd...@gmail.com>> Reply-To: mailto:user@cassandra.apache.org>> Date: Tue, 14 Aug 2012 21:58:30 -0700 To: mailto:user@cassandra.apache.org>> Subject: Re: re

Re: replace dead node? " token -1 "

2012-08-15 Thread Jim Cistaro
token, while the old host is dead, the other nodes on the ring says something like "this token xx is already owned by old_node_ip_here,.. ". I don't remember exactly the behavior now, that's why I'm cautious of using T instead of T-1. I'm doing more tests to

Re: nodetool repair uses insane amount of disk space

2012-08-17 Thread Jim Cistaro
We see similar issues with some of the repairs at Netflix. Regarding the growth in payload… we see similar symptoms where nodes can double or triple size. Part of this may be because the repair may deal in large chunks for comparisons. This means that even if there is one byte of entropy, you

Re: JMX(RMI) dynamic port allocation problem still exists?

2012-08-28 Thread Jim Cistaro
You may already be aware, but another possible solution is to use MX4J to do your JMX over REST (I have not tried this myself yet). http://wiki.apache.org/cassandra/Operations#Monitoring_with_MX4J From: Yang mailto:tedd...@gmail.com>> Reply-To: mailto:user@cassandra.apache.org>> Date: Tue, 2

Re: Repair has now effect

2012-09-02 Thread Jim Cistaro
What does "nt ring" show (on this node and on the other two)? That may provide some clues. From: Patricio Echagüe mailto:patric...@gmail.com>> Reply-To: mailto:user@cassandra.apache.org>> Date: Sun, 2 Sep 2012 15:50:31 -0700 To: mailto:cassandra-u...@incubator.apache.org>> Subject: Repair has no

Re: replace_token code?

2012-09-10 Thread Jim Cistaro
We have seen various issues from these replaced nodes hanging around. For clusters where a lot of nodes have been replaced, we see these replaced nodes having an impact on heap/GC and a lot of tcp timeouts/retransmits (because the old nodes no longer exist). As a result, we have begun cleaning

Re: Repair Failing due to bad network

2012-10-11 Thread Jim Cistaro
I am not aware of any built-in mechanism for retrying repairs. I believe you will have to build that into your process. As for reducing the time of each repair command to fit in your windows: If you have multiple reasonable size column families, and are not already doing this, one approach might

Re: leveled compaction and tombstoned data

2012-11-10 Thread Jim Cistaro
agaist 3 like sized files. 2) If you rely heavily on file cache (rather than large row caches), each major compaction effectively invalidates the entire file cache beause everything is written to one new large file. -- Jim Cistaro On 11/9/12 11:27 AM, "Rob Coli" wrote: >On Thu, Nov

Re: help turning compaction..hours of run to get 0% compaction....

2013-01-08 Thread Jim Cistaro
One metric to watch is pending compactions (via nodetool compactionstats). This count will give you some idea of whether you are falling behind with compactions. The other measure is how long you are compacting after your inserts have stopped. If I understand correctly, since you never update

Re: Cassandra pending compaction tasks keeps increasing

2013-01-19 Thread Jim Cistaro
1) In addition to iostat, dstat is a good tool to see wht kind of disck throuput your are getting. That would be one thing to monitor. 2) For LCS, we also see pending compactions skyrocket. During load, LCS will create a lot of small sstables which will queue up for compaction. 3) For us the bi

Re: Cassandra pending compaction tasks keeps increasing

2013-01-22 Thread Jim Cistaro
ssandra-1-2 Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 20/01/2013, at 7:49 AM, Jim Cistaro mailto:jcist...@netflix.com>> wrote: 1) In addition to iostat, dstat is a good tool to see wht kind of disck throuput

Re: -pr vs. no -pr

2013-03-02 Thread Jim Cistaro
One other slight advantage of -prŠ We sometimes have repairs that hang and need to be killed and restarted. -pr means you have to "redo" a fraction of the work. jc -Original Message- From: , Dean Reply-To: "user@cassandra.apache.org" Date: Friday, March 1, 2013 5:46 AM To: "user@cassan

Re: removing old nodes

2013-03-21 Thread Jim Cistaro
I do not recall what the "50" means, but IIRC, the 1364152145790 is the unix timestamp (in millisecs rather than secs) of the expire time when they _should_ go away completely. perl -e 'print scalar(gmtime(1364152145))' Sun Mar 24 19:09:05 2013 From: Ben Chobot mailto:be...@instructure.com>> Re