> > > Maybe I don't see something obvious? :) > > I think you're missing the fact that your patch doesn't do what you assert > above > either :)
Issue is not in setting alarms but canceling it. If you look closer to my patch you see that it address this issue (look at added *do { lock(); ....; unlock(); } while( )* part). > > First, lets address rte_alarm_set. There is no notion of "re-arming" in this > alarm implementation, because theres no ability to refer to a specific alarm > from the callers perspective. When you call rte_eal_alarm_set you get a new > alarm every time. So I don't really see a race there. It might not be > exactly > the behavior you want, but its not a race, becuase you're not modifying an > alarm > in the middle of execution, you're just creating a new alarm, which is safe. OK, it is safe, but this is not the case. > > There is a race in what you describe above, insofar as its possible that you > might call rte_eal_alarm_cancel and return without having canceled all the > matching alarms. I don't see any clear documentation on what the behavior is > supposed to be, but if you want to ensure that all matching alarms are > cancelled > or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in > linux > API parlance, thats usually denoted as a cancel_sync operation). Again, look at the patch. I changed documentation to inform about this behavior. > > For that race condition, you're correct, my patch doesn't address it, I see > that > now. Though your patch doesn't either. If you call rte_eal_alarm_cancel from > within a callback function, then, by definition, you can't wait on the > completion of the active alarm, because thats a deadlock. Its a necessecary > evil, I grant you, but it means that you can't be guaranteed the cancelled and > complete (cancel_sync) behavior that you want, at least not with the current > api. If you want that behavior, you need to do one of two things: This patch does not break any API. It only removes undefined behavior. > > 1) Modify the api to allow callers to individually reference timer instances, > so > that when cancelling, we can return an appropriate return code to indicate to > the caller that this alarm is in-progress. That way you can guarantee the > caller that the specific alarm that you cancelled is either complete and > cancelled > or currently executing. Add an api to expicitly wait on a referenced alarm as > well. This allows developers to know that, when executing an alarm callback, > an > -ECURRENTLYEXECUTING return code is ok, because they are in the currently > executing context. This would brake API for sure.