> On May 28, 2015, 7:10 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/utils/timer/TimerTaskList.scala, lines 132-135
> > <https://reviews.apache.org/r/34734/diff/1/?file=973063#file973063line132>
> >
> >     So, I guess the race condition is the following. The expiration thread 
> > moves a TimerTaskEntry from one TimerTaskList to another during reinsert. 
> > At the same time, another thread can complete an operation and try to 
> > remove the entry from the list. Is that right?
> >     
> >     With the patch, it seems when TimerTask.cancel tries to re move an 
> > entry from the list, the following can happen (1) line 133 tests that list 
> > in entry is not null, (2) a reinsert process happens and the entry is 
> > removed from list which sets list in the entry to null, (3) list.remove in 
> > 134 is called and the entry is not removed since list is now null, (4) line 
> > 133 is tested again and since list is now null, we quit the loop, (5) the 
> > reinsert process adds the entry to a new list. 
> >     
> >     At this point, a completed entry still exists in the list.
> 
> Yasuhiro Matsuda wrote:
>     You are right. It should be rare, but a completed entry can remain in the 
> list until expiration. The completion flag in DelayedOperation prevents 
> excess executions. So, it is not too bad.
> 
> Jun Rao wrote:
>     Thanks for the clarification. Thinking about this a bit more, could we 
> hit a NullPointerException in step (3)? At that point, list could be null 
> when we call remove.
> 
> Onur Karaman wrote:
>     Yeah that check-then-act should probably be done atomically. Maybe all 
> changes/check-then-acts to TimerTaskEntry just need to be guarded by the 
> TimerTaskEntry's intrinsic lock?
> 
> Yasuhiro Matsuda wrote:
>     Oops. The value of TimerTaskEntry.list should be save to a local variable.
> 
> Jun Rao wrote:
>     Ok, thanks. Perhaps we can also add some comments explaining why we need 
> the while loop and the rare possibility of not removing a completed operation 
> from the Timer.

Could it happen that concurrent threads are calling TimerTaskList.add(entry) 
and TimerTaskList.remove(entry) on different lists for the same entry? Since we 
do not synchronize on the entry object this could cause race condition on 
next/prev/list reference manipulation.


- Guozhang


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34734/#review85596
-----------------------------------------------------------


On May 27, 2015, 9 p.m., Yasuhiro Matsuda wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34734/
> -----------------------------------------------------------
> 
> (Updated May 27, 2015, 9 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2226
>     https://issues.apache.org/jira/browse/KAFKA-2226
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> fix a race condition in TimerTaskEntry.remove
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/utils/timer/TimerTaskList.scala 
> e7a96570ddc2367583d6d5590628db7e7f6ba39b 
> 
> Diff: https://reviews.apache.org/r/34734/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Yasuhiro Matsuda
> 
>

Reply via email to