You're right, those could all cause what you are seeing.

We used to have a "re-check hourly" scheduled task, but took it out
because it was very very performance intensive -- at the time, hints
were not stored by machine so asking "does machine X have any hints"
required scanning all hints.  Should be fine to add that back now.

On Wed, Jun 15, 2011 at 7:48 PM, Terje Marthinussen
<tmarthinus...@gmail.com> wrote:
> I suspect a few possibilities:
> 1. I have not checked, but what happens (in terms of hint delivery) if a
> node tries to write something but the write times out even if the node is
> marked as up?
> 2. I would assume there can be ever so slight variations in how different
> nodes in the cluster think the rest of the cluster is up. These events will
> of course typically  be short lived (unless some sort of long term split
> brain situation occurs), but if you are writing data while for instance a
> node is restarting, I would not be surprised if there are race conditions
> where A see B as down, sends a hint to C but C already think B is up
> 3. I have observed situations where it seems like a node comes in up state
> but for some reason takes a while to get really operational. Hint delivery
> fails, the hint sender gives up and nothing more happens.
>
> May be an idea to let a node check if it has hints on heartbeats maybe
> (potentially not all of them, but at a regular interval)?
>
> Terje
>
> On Thu, Jun 16, 2011 at 2:08 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>
>> On Wed, Jun 15, 2011 at 10:53 AM, Terje Marthinussen
>> <tmarthinus...@gmail.com> wrote:
>> > I was looking quickly at source code tonight.
>> > As far as I could see from a quick code scan, hint delivery is only
>> > triggered as a state change from a node is down to when it enters up
>> > state?
>>
>> Right.
>>
>> > If this is indeed the case, it would potentially explain why we
>> > sometimes
>> > have hints on machines which does not seem to get played back
>>
>> Why is that?  Hints don't get created in the first place unless a node
>> is in the down state.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Reply via email to