On Mon, Sep 29, 2014 at 10:11:38AM +0000, Wodkowski, PawelX wrote:
> > >
> > > Image how you will be damned by someone that not even notice you change
> > > and he Is managing some kind of resource based on returned number of
> > > set/canceled timers. If you suddenly start returning negative values how 
> > > those
> > > application will behave? Silently changing returned value domain is evil 
> > > in its
> > > pure form.
> > 
> > As I can see the impact is very limited.
> 
> It is small impact to DPDK but can be huge to user application:
> Ex:
> If someone use this kind of expression in callback (skipping user app 
> serialization part):
> callback () {
> ...
> some_simple_semaphore += rte_alarm_cancel(...));

This code would be broken to begin with, as rte_eal_alarm_cancel is already
written to return negative return codes.  Its not documented as such, but its
still the case.  Note that if you run an application built against a shared
library on BSD, the definition of rte_eal_alarm_cancel returns -ENOTSUP.  The
above code would be broken because it doesn't account for that.  You can argue
that the documentation should be updated, but the dpdk in the wild already
conforms to the model Konstantin and I are proposing.

> ...
> }
> 
> Anywhere in the code:
> ...
> If (some_simple_semapore) {
>       some_simple_semapore --;
>       if (rte_eal_alarm_set(...) != 0)
>               some_simple_semapore ++;
> }
> ...
> 
> 1. Do you notice the change in cancel function?
The application crashes, or otherwise misbehaves.

> 2. How many hours you spend to find this issue in case of big app/system?
You don't.  Such a problem as you describe would very likely result in a
semaphore deadlock, as the count would be incorrectly lowered, so you put
watches on the variable, note that sometimes the count goes down on a cancel,
which is completely counter-intuitive, read the updated documentation that
indicates error codes are possible (which you should have been prepared for
anyway), and move on with your day.

> 
> > Only code that does check for (rte_alarm_cancel(...) == 0/ != 0) inside 
> > alarm
> > callback function might be affected.
> > From other side, indeed, there could exist situations, when the caller 
> > needs to
> > know
> > was the alarm successfully cancelled or not.
> > And if not by what reason.
> > 
> 
> I can extend API of rte alarms to add alarm state checking in next patch,  
> but for 
> now, since this is not urgent I think original patch  v2 should be enough.
I re-assert my origional argument here, without the above change, you haven't
really fixed the race.  If you can find another way to do it, thats fine with
me, but keep in mind once again, that some implementations of rte_eal_alarm_set
already do whats being proposed.

Neil

> 
> Pawel
> 

Reply via email to