On Sun, Sep 28, 2014 at 04:12:04PM +0000, Ananyev, Konstantin wrote: > > > > -----Original Message----- > > From: Neil Horman [mailto:nhorman at tuxdriver.com] > > Sent: Friday, September 26, 2014 8:39 PM > > To: Ananyev, Konstantin > > Cc: Wodkowski, PawelX; dev at dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to > > thread-safe: > > > > On Fri, Sep 26, 2014 at 06:07:14PM +0000, Ananyev, Konstantin wrote: > > > > > > > > > > > As I remember the purpose of the patch was to fix the race condition > > > > > inside rte_alarm library. > > > > > I believe that the patch provided by Michal & Pawel fixes the issues > > > > > you discovered. > > > > > If you think, that is not the case, could you please provide a list > > > > > of remaining issues? > > > > > Excluding ones that you just don't like it, and you are not happy > > > > > with rte_alarm API in total? > > > > > > > > > > Gladly. As Pawel explained the race, its possible that, after calling > > > > rte_eal_alarm_cancel, an in-flight execution of an alarm callback may > > > > still be > > > > running. The problem with that ostensibly is that data which is being > > > > accessed > > > > by the callback might be then accessed in parallel with another process > > > > leading > > > > to data corruption or some other problem. The issue I have with his > > > > patch is > > > > that it doesn't completely close the race. While it does close the > > > > race for the > > > > condition in whcih thread B is running the alarm callback while thread > > > > A is > > > > executing the cancel operation, it does not close the case for when a > > > > single > > > > thread B is running the cancel operation, as the in-flight execution > > > > itself is > > > > still active. > > > > > > A bit puzzled here: > > > Are you saying that calling alarm_cancel() for itself inside > > > eal_alarm_callback() might cause a problem? > > > I still don't see how. > > > > > Potentially yes, by the same race condition that exists when using a > > secondary > > thread to do the cancel call. As I understand it the race that Pawel > > described > > is as follows: > > > > Thread A Thread B > > alarm_cancel() eal_alarm_callback > > block on alarm spinlock drop spinlock > > run cancel operation execute callback > > function > > return from cancel > > rte_eal_alarm_set > > > > As Pawel described the problem, there is a desire to not set the new alarm > > while > > the old alarm is still executing. And his patch accomplishes that for the > > two > > thread case above just fine > > > > The problem with Pawels patch is that its non functional in the case where > > the > > cancel happens within Thread B. Lets change the scenario just a little bit: > > > > Thread B Thread C > > eal_alarm_callback > > callback_function > > some_other_common_func > > rte_eal_alarm_cancel(this) > > pthread_signal(Thread C) wake up > > operate on alarm data rte_eal_alarm_set > > > > As I can see, there is an incorrect behaviour in your callback_function > example. > It should first finish with " eal_alarm_callback" and only then send a signal > to other thread. > Otherwise we can't help it in any way. > But I think, I understand your concern: > after rte_eal_aralm_cancel() finishes, the caller can't clearly distinguish > what exactly happen: > 1) alarm was cancelled succesfully. > 2) alarm was not found (already cancelled or executed). > 3) alarm is executing by the same thread and can't be cancelled. > > Basically right now the caller can distinguish that either #1 or #2,3 > happened, but can't distinguish between 2 & 3. > Correct? > Yes, this is my concern exactly.
> If that's so, then I suppose we can do: make alarm_cancel() to return a > negative value for the case #3 (-EINPROGRESS or something). > Something like: > ... > if (ap->executing == 0) { > LIST_REMOVE(ap,next); > rte_free(ap); > count++; > ap = ap_prev; > } else if (pthread_equal(ap->executing_id, pthread_self()) == 0) { > executing++; > } else { > ret = -EINPROGRESS; > } > ... > return ((ret != 0) ? ret : count); > > So the return value will be > 0 for #1, 0 for #2, <0 for #3. > As I remember, you already suggested something similar in one of the previous > mails. Yes, I rolled the API changes I suggested in with this model, because I wanted to be able to do precise specification of a timer instance to cancel, but if we're not ready to make that change, I think what you propose above would be suffficient. Theres some question as to weather we would cancel timers that are still pending on a return of -EINPROGRESS, but I think if we document it accordingly, then it can be worked out just fine. Best Neil > Konstantin > > > > > > > > In this scenario the problem is not fixed because when called from within > > the > > alarm thread, the executing alarm is skipped (as it must be), but that fact > > is > > invisible to the caller, and because of that its still possible for the same > > origional problem to occur. > > > > Neil > > >