Currently, I'm using cassandra 1.1.5, but I'm considering to update to 1.2.x in order to make use of vnodes. Doubling the size is not possible to me because I want to measure the response while adding (or removing) single nodes. Thank you guys. It help me a lot to understand better how cassandra works.
Att. *Rodrigo Felix de Almeida* LSBD - Universidade Federal do CearĂ¡ Project Manager MBA, CSM, CSPO, SCJP On Wed, Jul 10, 2013 at 11:11 AM, Eric Stevens <migh...@gmail.com> wrote: > > => Adding a new node between other nodes would avoid running move, but > the ring would be unbalanced, right? Would this imply in having a node > (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each, > supposing 3 nodes) overloaded? I'm refering > http://wiki.apache.org/cassandra/Operations#Load_balancing >> >> >>> >>> Yes, if you're using a single vnode per server, or are running an older > version of Cassandra. For lowest impact, doubling the size of your cluster > is recommended so that you can avoid doing moves. Or if you're on > Cassandra 1.2+, you can use vnodes, and you should not typically need to > rebalance after bringing a new server online. > > > On Tue, Jul 9, 2013 at 9:31 PM, Rodrigo Felix < > rodrigofelixdealme...@gmail.com> wrote: > >> Thank you very much for you response. Follows my comments about your >> email. >> >> Att. >> >> *Rodrigo Felix de Almeida* >> LSBD - Universidade Federal do CearĂ¡ >> Project Manager >> MBA, CSM, CSPO, SCJP >> >> >> On Mon, Jul 8, 2013 at 6:05 PM, Robert Coli <rc...@eventbrite.com> wrote: >> >>> On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix < >>> rodrigofelixdealme...@gmail.com> wrote: >>> >>>> >>>> - Is it normal to take about 9 minutes to add a new node? Follows >>>> the log generated by a script to add a new node. >>>> >>>> Sure. => OK >>> >>>> >>>> - Is there a way to reduce the time to start cassandra? >>>> >>>> Not usually. => OK >>> >>>> >>>> - Sometimes cleanup operation takes make minutes (about 10). Is >>>> this normal since the amount of data is small (1.7gb at maximum / seed)? >>>> >>>> Compaction is throttled, and cleanup is a type of compaction. Bootstrap >>> is also throttled via the streaming throttle. => OK >>> >>>> >>>> - Considering that I have two seeds in the beginning, their tokens >>>> are 0 and 85070591730234615865843651857942052864. When I add a new >>>> machine, >>>> do I need to execute move and cleanup on both seeds? Nowadays, I'm >>>> running >>>> cleanup on seed 0, move + cleanup on the other seed and neither move nor >>>> cleanup on the just added node. Is this OK? >>>> >>>> Only nodes which have "lost" ranges need to run cleanup. In general you >>> should add new nodes "between" other nodes such that "move" is not required >>> at all. >>> >> >> => Adding a new node between other nodes would avoid running move, but >> the ring would be unbalanced, right? Would this imply in having a node >> (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each, >> supposing 3 nodes) overloaded? I'm refering >> http://wiki.apache.org/cassandra/Operations#Load_balancing >> >>> >>>> - What if I do not run cleanup in any existing node when adding or >>>> removing a node? Is the data that was not "cleaned up" still available >>>> if I >>>> send a scan, for instance, and the scan range is still in the node but >>>> it >>>> wouldn't be there if I had run cleanup? Data would be gather from other >>>> node, ie. the one that properly has the range specified in the scan >>>> query? >>>> >>>> If data for range [x] is on node [a] but node [a] is no longer >>> considered an endpoint for range [x], it will never receive a request to >>> serve range [x]. => OK >>> >>>> >>>> - After decommissioning a node, is it advisable to run cleanup in >>>> the remaining nodes? The consequences of not to run are the same of not >>>> to >>>> run when adding a node? >>>> >>>> Cleanup is only for the node which lost a range. In decommission case, >>> no live nodes lost a range, only some nodes gained one. => OK >>> >>> =Rob >>> >> >> >