On Wed, Feb 9, 2011 at 4:31 PM, Aaron Morton <aa...@thelastpickle.com>wrote:
> Thanks Gary. I'll keep an eye on things and see if it happens again. > > From reading the code I'm wondering if there is a small chance of a race > condition in HintedHandoffManager.waitForSchemaAgreement() . > > Could the following happen? I'm a little unsure on exactly how the endpoint > state is removed from the map in Gossiper. > > 1) node 1 starts > 2) Gossiper calls StorageService.onAlive() when the endpoints are detected > as alive. > 3) HintedHandoffManager.deliverHints() adds a runnable to the HintedHandoff > TP > 4) This happens several times, and node 1 gets busy delivering hints but > there is only 1 thread in the thread pool. > 5) Node n is removed from the cluster and the endpoint state is deleted in > the Gossiper on node 1 > 6) Node 1 gets around to processing the hints for node n and > Gossiper.getEndpointStateForEndpoint() returns null for node n > Yes, this is currently possible, but you have to decommission the node before the schema check/sleep portion of HH is over, which is unlikely in practice. It will be especially unlikely after https://issues.apache.org/jira/browse/CASSANDRA-2115. -Brandon