You're right, those could all cause what you are seeing. We used to have a "re-check hourly" scheduled task, but took it out because it was very very performance intensive -- at the time, hints were not stored by machine so asking "does machine X have any hints" required scanning all hints. Should be fine to add that back now.
On Wed, Jun 15, 2011 at 7:48 PM, Terje Marthinussen <tmarthinus...@gmail.com> wrote: > I suspect a few possibilities: > 1. I have not checked, but what happens (in terms of hint delivery) if a > node tries to write something but the write times out even if the node is > marked as up? > 2. I would assume there can be ever so slight variations in how different > nodes in the cluster think the rest of the cluster is up. These events will > of course typically be short lived (unless some sort of long term split > brain situation occurs), but if you are writing data while for instance a > node is restarting, I would not be surprised if there are race conditions > where A see B as down, sends a hint to C but C already think B is up > 3. I have observed situations where it seems like a node comes in up state > but for some reason takes a while to get really operational. Hint delivery > fails, the hint sender gives up and nothing more happens. > > May be an idea to let a node check if it has hints on heartbeats maybe > (potentially not all of them, but at a regular interval)? > > Terje > > On Thu, Jun 16, 2011 at 2:08 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> On Wed, Jun 15, 2011 at 10:53 AM, Terje Marthinussen >> <tmarthinus...@gmail.com> wrote: >> > I was looking quickly at source code tonight. >> > As far as I could see from a quick code scan, hint delivery is only >> > triggered as a state change from a node is down to when it enters up >> > state? >> >> Right. >> >> > If this is indeed the case, it would potentially explain why we >> > sometimes >> > have hints on machines which does not seem to get played back >> >> Why is that? Hints don't get created in the first place unless a node >> is in the down state. >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com