Hi Andriy,

On 08/31/15 18:46, Andriy Gapon wrote:
On 30/08/2015 22:09, Andriy Gapon wrote:
On 30/08/2015 19:16, Konstantin Belousov wrote:
This is strange, I do not think that could be a right explanation of this
issue.  The taskqueue callout is initialized with the mutex, which means
that the callout_stop() caller
- must own the mutex;
- is synchronous with the callout.
In other words, callout cannot be running when taskqueue_cancel_timeout()
calls callout_stop(), it only can be dequeued but the callout function
is not yet called.  If callout_stop() is performed meantime, between
dropping callout_cpu lock, and locking the mutex, it must be not run.

Thank you for the explanation.  I am not familiar with the code and I
misinterpreted the manual page and thought that callout_stop() might be unable
to stop the callout even if it was initialized with the mutex.  I see my mistake
now.

I had to look at the code, of course.
Could the following logic actually be buggy?

int
callout_reset_sbt_on(struct callout *c, sbintime_t sbt, sbintime_t precision,
     void (*ftn)(void *), void *arg, int cpu, int flags)
{
...
         if (cc_exec_curr(cc, direct) == c) {
                 /*
                  * We're being asked to reschedule a callout which is
                  * currently in progress.  If there is a lock then we
                  * can cancel the callout if it has not really started.
                  */
                 if (c->c_lock != NULL && cc_exec_cancel(cc, direct))
                         cancelled = cc_exec_cancel(cc, direct) = true;

I have a suspicion that the condition should check for !cc_exec_cancel:
- it seems that cc_exec_cancel is set to true when the callback execution starts
- logically, it wouldn't make sense to check if X is true and then set it to 
true

The code is quite hard to understand in a short time, so I could be
misinterpreting what's going on, but I'd like to check my suspicion.

While touching this topic, is the following code more readable or not (see the callout_restart_async function):

https://svnweb.freebsd.org/base/projects/hps_head/sys/kern/kern_timeout.c?revision=287248&view=markup


If this is a bug indeed, then it might make the following scenario possible:

- softclock() is being executed in thread T1 and the control flow is in
softclock_call_cc() after CC_UNLOCK() and before lc_lock(c_lock)
- thread T2 is executing callout_reset_sbt_on on the same callout as T1
- so, I think, T2 will see cc_exec_cancel == false and thus, because of the
suspected bug, cc_exec_cancel will remain false
- T2 will proceed to callout_cc_add() and will happily (re-)schedule the callout
- after T2 releases the callout lock, T1 will continue in softclock_call_cc()
and because cc_exec_cancel is false the callout's callback function will get 
called

Correct.


In effect, there will be an extra call of the callout.
If that's so that could explain the triggered assertion in 
taskqueue_timeout_func().

P.S. I am going to add an assertion in callout_cc_add that CALLOUT_ACTIVE is not
set and I will see if I can get anything interesting from it.

P.P.S. On a further look, it seems that there is a bug indeed and it's probably
just a typo that got introduced in r278469:
-               if (c->c_lock != NULL && !cc->cc_exec_entity[direct].cc_cancel)
-                       cancelled = cc->cc_exec_entity[direct].cc_cancel = true;
-               if (cc->cc_exec_entity[direct].cc_waiting) {
+               if (c->c_lock != NULL && cc_exec_cancel(cc, direct))
+                       cancelled = cc_exec_cancel(cc, direct) = true;
+               if (cc_exec_waiting(cc, direct)) {


Looks like a bug to me trying to make the code more readable removed a not in there!

--HPS

_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to