Peter, I want to join everyone else thanking you for helping out so much with this thread, and especially for pointing out the problems with the DS docs on this topic. We have some corrections posted today, and will keep looking to improve the information.
On Thu, Mar 31, 2011 at 3:11 PM, Peter Schuller <peter.schul...@infidyne.com > wrote: > > Thanks a lot for elaborating on repairs. Still, it's a bit fuzzy to me > why it is so important to run a repair before the GCGraceSeconds kicks in. > Does this mean a delete does not get "replicated" ? In other words when I > delete something on a node, doesn't cassandra set tombstones on its replica > copies? > > Deletes are replicated, but deletes are special in that unlike actual > data, you're wanting to *remove* something, but the information that > says "stuff is gone" is information in and of itself. Clearly you > don't want to forever and ever keep track of anything ever removed in > the cluster, so this has to expire somehow. For that reason, there is > a requirement that tombstones are replicated prior to their expiry. > See: > > http://wiki.apache.org/cassandra/DistributedDeletes > > > And technically, isn't repair only needed for cases where things weren't > properly propogated in the cluster? If all writes are written to the right > replicas, and all deletes are written to all the replicas, and all nodes > were available at all times, then everything should work as designed - > without manual intervention, right? > > Yes, but you can assume that doesn't happen in real life for extended > periods of time. It doesn't take a lot at all for a *few* writes not > getting replicated (for example, just restarting a Cassandra node will > cause some writes to be dropped - hinted handoff is not a guarantee, > only an optimization). > > -- > / Peter Schuller >