Giving this some more thought, I think it's fair to say that using LOCAL_ONE and LOCAL_QUORUM instead of ONE and QUORUM in this situation is a actually workaround rather than a solution for this problem.
LOCAL_ONE and LOCAL_QUORUM are introduced to ensure that only the local DC is used, which can be very useful. But not everybody needs this restriction. So if you don't need this restriction, and you're normally using ONE, when you're setting up a new DC and use rebuild to fill the node(s), you're in trouble. To avoid this, you could (temporarily) change the CL to LOCAL_ONE. But changing the CL of all queries of all clients can potentially be very costly, depending on your code. Wouldn't it be far more efficient if a node that is rebuilding itself is responsible for not accepting reads until the rebuild is complete? E.g. by marking it as "Joining", similar to a node that is being bootstrapped? Tom On Thu, Sep 11, 2014 at 11:10 PM, Tom van den Berge <t...@drillster.com> wrote: > Thanks, Rob. > I actually tried using LOCAL_ONE instead of ONE, but I still saw this > problem. Maybe I missed some queries when updating to LOCAL_ONE. Anyway, > it's good to know that this is supposed to work. > > Tom > > On Thu, Sep 11, 2014 at 10:28 PM, Robert Coli <rc...@eventbrite.com> > wrote: > >> On Thu, Sep 11, 2014 at 1:18 PM, Tom van den Berge <t...@drillster.com> >> wrote: >> >>> When setting up a new (additional) data center, the documentation tells >>> us to use "nodetool rebuild -- <old dc>" to fill up the node(s) in the new >>> dc, and to disable auto_bootstrap. >>> >>> I'm wondering if it is possible to fill the node with >>> "auto_bootstrap=true" instead of a nodetool rebuild command. If so, how >>> will Cassandra decide from where to stream the data? >>> >> >> Yes, if that node can hold 100% of the replicas for the new DC. >> >> Cassandra will decide from where to stream the data in the same way it >> normally does, by picking one replica per range and streaming from it. >> >> But you probably don't generally want to do this, rebuild exists for this >> use case. >> >> The reason I'm asking is that when using rebuild, I've learned from >>> experience that the node immediately joins the cluster, and starts >>> accepting reads (from other DCs) for the range it owns. But since the data >>> is not complete yet, it can't return anything. This seems to be a dangerous >>> side effect of this procedure, and therefore can't be used. >>> >> >> Yes, that's why LOCAL_ONE ConsistencyLevel was created. Use it, and >> LOCAL_QUORUM, instead of ONE and QUORUM. >> >> =Rob >> >> > > > > -- > > Drillster BV > Middenburcht 136 > 3452MT Vleuten > Netherlands > > +31 30 755 5330 > > Open your free account at www.drillster.com > -- Drillster BV Middenburcht 136 3452MT Vleuten Netherlands +31 30 755 5330 Open your free account at www.drillster.com