> => Adding a new node between other nodes would avoid running move, but the ring would be unbalanced, right? Would this imply in having a node (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each, supposing 3 nodes) overloaded? I'm refering http://wiki.apache.org/cassandra/Operations#Load_balancing > > >> >> Yes, if you're using a single vnode per server, or are running an older version of Cassandra. For lowest impact, doubling the size of your cluster is recommended so that you can avoid doing moves. Or if you're on Cassandra 1.2+, you can use vnodes, and you should not typically need to rebalance after bringing a new server online.
On Tue, Jul 9, 2013 at 9:31 PM, Rodrigo Felix < rodrigofelixdealme...@gmail.com> wrote: > Thank you very much for you response. Follows my comments about your email. > > Att. > > *Rodrigo Felix de Almeida* > LSBD - Universidade Federal do CearĂ¡ > Project Manager > MBA, CSM, CSPO, SCJP > > > On Mon, Jul 8, 2013 at 6:05 PM, Robert Coli <rc...@eventbrite.com> wrote: > >> On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix < >> rodrigofelixdealme...@gmail.com> wrote: >> >>> >>> - Is it normal to take about 9 minutes to add a new node? Follows >>> the log generated by a script to add a new node. >>> >>> Sure. => OK >> >>> >>> - Is there a way to reduce the time to start cassandra? >>> >>> Not usually. => OK >> >>> >>> - Sometimes cleanup operation takes make minutes (about 10). Is this >>> normal since the amount of data is small (1.7gb at maximum / seed)? >>> >>> Compaction is throttled, and cleanup is a type of compaction. Bootstrap >> is also throttled via the streaming throttle. => OK >> >>> >>> - Considering that I have two seeds in the beginning, their tokens >>> are 0 and 85070591730234615865843651857942052864. When I add a new >>> machine, >>> do I need to execute move and cleanup on both seeds? Nowadays, I'm >>> running >>> cleanup on seed 0, move + cleanup on the other seed and neither move nor >>> cleanup on the just added node. Is this OK? >>> >>> Only nodes which have "lost" ranges need to run cleanup. In general you >> should add new nodes "between" other nodes such that "move" is not required >> at all. >> > > => Adding a new node between other nodes would avoid running move, but the > ring would be unbalanced, right? Would this imply in having a node (with > bigger range, 1/2 of the range while other 2 nodes have 1/2 each, supposing > 3 nodes) overloaded? I'm refering > http://wiki.apache.org/cassandra/Operations#Load_balancing > >> >>> - What if I do not run cleanup in any existing node when adding or >>> removing a node? Is the data that was not "cleaned up" still available >>> if I >>> send a scan, for instance, and the scan range is still in the node but it >>> wouldn't be there if I had run cleanup? Data would be gather from other >>> node, ie. the one that properly has the range specified in the scan >>> query? >>> >>> If data for range [x] is on node [a] but node [a] is no longer >> considered an endpoint for range [x], it will never receive a request to >> serve range [x]. => OK >> >>> >>> - After decommissioning a node, is it advisable to run cleanup in >>> the remaining nodes? The consequences of not to run are the same of not >>> to >>> run when adding a node? >>> >>> Cleanup is only for the node which lost a range. In decommission case, >> no live nodes lost a range, only some nodes gained one. => OK >> >> =Rob >> > >