On Wed, Feb 9, 2011 at 4:31 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

> Thanks Gary. I'll keep an eye on things and see if it happens again.
>
> From reading the code I'm wondering if there is a small chance of a race
> condition in HintedHandoffManager.waitForSchemaAgreement() .
>
> Could the following happen? I'm a little unsure on exactly how the endpoint
> state is removed from the map in Gossiper.
>
> 1) node 1 starts
> 2) Gossiper calls StorageService.onAlive() when the endpoints are detected
> as alive.
> 3) HintedHandoffManager.deliverHints() adds a runnable to the HintedHandoff
> TP
> 4) This happens several times, and node 1 gets busy delivering hints but
> there is only 1 thread in the thread pool.
> 5) Node n is removed from the cluster and the endpoint state is deleted in
> the Gossiper on node 1
> 6) Node 1 gets around to processing the hints for node n and
> Gossiper.getEndpointStateForEndpoint() returns null for node n
>

Yes, this is currently possible, but you have to decommission the node
before the schema check/sleep portion of HH is over, which is unlikely in
practice.  It will be especially unlikely after
https://issues.apache.org/jira/browse/CASSANDRA-2115.

-Brandon

Reply via email to