Coordination in a distributed system is difficult. I don't think we can fix HH's existing edge cases, without introducing other more complicated edge cases.
So weekly-or-so repair will remain a common maintenance task for the forseeable future. On Wed, Jul 14, 2010 at 4:17 PM, B. Todd Burruss <bburr...@real.com> wrote: > thx, but disappointing :) > > is this just something we have to live with and periodically "repair" > the nodes? or is there future work to tighten up the window? > > thx > > > On Wed, 2010-07-14 at 12:13 -0700, Jonathan Ellis wrote: >> On Wed, Jul 14, 2010 at 1:43 PM, B. Todd Burruss <bburr...@real.com> wrote: >> > there is a window of time from when a node goes down and when the rest >> > of the cluster actually realizes that it is down. >> > >> > what happens to writes during this time frame? does hinted handoff >> > record these writes and then "handoff" when the down node returns? or >> > does hinted handoff not kick in until the cluster realizes the node is >> > down? >> >> the latter. >> >> > ... is the only way these missed writes are repaired is through read >> > repair and/or manually kicking off "nodetool repair"? >> >> yes. >> > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com