[ 
https://issues.apache.org/jira/browse/CASSANDRA-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874432#comment-13874432
 ] 

Jonathan Ellis commented on CASSANDRA-6557:
-------------------------------------------

Just to make sure we're talking about the same problem:

Say I have allocatingFrom=X, active = [X, Y], and available=[Z].

Someone calls advanceAllocatingFrom.  We iterate over available, and try to 
swap the first element into allocatingFrom.

So now we have 
old = X
aF = Z
active = [X, Y]
available = [Z]

Our next step is to remove Z from available and add it to active, but if 
another thread calls advanceAllocatingFrom before that happens, the initial 
thread running aAF and the other will both add Z to active, so we'll have [X, 
Y, Z, Z] which is a violation of our design.

It seems to me that we can fix this much more easily by simply dequeuing the 
element from available before trying to CAS.  If we fail, we can add it back.  
This is a relatively rare operation, and a race even more rare, so it doesn't 
have to be super optimized.

> CommitLogSegment may be duplicated in unlikely race scenario
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-6557
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6557
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 2.1
>            Reporter: Benedict
>             Fix For: 2.1
>
>
> In the unlikely event that the thread that switched to a new CLS has not 
> finished executing the cleanup of its switch by the time the CLS has finished 
> being used, it is possible for the same segment to be 'switched' in again. 
> This would be benign except that it is added to the activeSegments queue a 
> second time also, which would permit it to be recycled twice, creating two 
> different CLS objects in memory pointing to the same CLS on disk, after which 
> all bets are off.
> The issue is highly unlikely to occur, but highly unlikely means it will 
> probably happen eventually. I've fixed this based on my patch for 
> CASSANDRA-5549, using the NonBlockingQueue I introduce there to simplify the 
> logic and make it more obviously correct.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to