Thanks Jeff for detailed clarifications. We tried rebuild data in Spark DC nodes one node at a time in May again but ran into issues. prod has 3 DC, DC1(9 nodes) and DC2 (9 nodes) are only C* and DC3 has spark with 3 nodes and vnodes enabled with numtokens=32
We also dropped few unused indexes. Then tried rebuild data in spark node 1 using DC2 which takes less traffic so that DC1 will not be impacted. When we tried this approach in May, we got latency hit on DC1. We stopped rebuild job and still saw the latency impact. Latency issue stopped once we remove replication to Spark nodes at keyspace level. we need to rebuild data in new DC which has spark enabled. I am thinking of this new approach to see if this will avoid latency impact. 1. Enable replication for keyspace with rep factor =1 for Analytics DC 2. keep the vnodes to be 32 and 3. do the nodetool rebuild in first spark DC node using DC2 OR 1. disable vnodes and use manual tokens for all 3 nodes in Spark DC 2. cleanup all old data/logs 3. enable replication factor =1 for keyspace for Analytics DC 4. rebuild data using DC2 I want to ensure that new approach does not send traffic to DC1 at all. let me know if there is any other options or if we need to change any of the above approach. Appreciate feedback. On Wed, Dec 9, 2015 at 2:37 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > Streaming with vnodes is not always pleasant – rebuild uses streaming (as > does bootstrap, repair, and decommission). The rebuild delay you see may or > may not be related to that. It could also be that the streams timed out, > and you don’t have a stream timeout set. Are you seeing data move? Are the > new nodes busy compacting? Secondary indexes themselves may not cause > problems, but there are cases where very large indexes (due to very large > partitions or unusual cardinalities) may case problems. > > > 1. The other way is to backup your data, make a new vnode cluster, and > load your data in with sstableloader > 2. Known issues are that streaming with vnodes creates a lot of small > tables and does a lot more work than streaming without vnodes > 3. Not necessarily > 4. See #2 > > > From: cass savy > Reply-To: "user@cassandra.apache.org" > Date: Wednesday, December 9, 2015 at 1:26 PM > To: "user@cassandra.apache.org" > Subject: Switching to Vnodes > > We want to move our clusters to use Vnodes. I know the docs online say we > have to create new DC with vnodes and move to new dC and decommission old > one. We use DSE for our c* clusters.C* version is 2.0.14 > > 1. Is there any other way to migrate existing nodes to vnodes? > 2. What are the known issues with that approach? > 3. We have few secondary indexes in the keyspace, will that cause any > issues with moving to vnodes? > > 4. What are the issues encountered after moving to vnodes in PROD > 5. anybody recommend Vnodes for Spark nodes. > > *Approach : Moving to new DC with vnodes enabled*: > When I tested it for a keyspace which has secondary indexes, rebuilds on > Vnode enabled Datacenter takes days and don't know when it completes or > even if it will complete. I tried with 256,32,64 tokens per node but no > luck. > > Please advise. > > >