On Aug 27, 2013, at 11:02 PM, Ben Pfaff <b...@nicira.com> wrote:

> On Tue, Aug 27, 2013 at 03:50:06PM -0700, Jarno Rajahalme wrote:
>> 
>> On Aug 27, 2013, at 3:13 PM, Ben Pfaff <b...@nicira.com> wrote:
>> 
>>> On Tue, Aug 27, 2013 at 03:04:54PM -0700, Jarno Rajahalme wrote:
>>>> Batching reduces overheads and enables upto 4 times the upcall processing
>>>> performance in a specialized test case.
>>>> 
>>>> Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com>
>>> 
>>> Nice!
>>> 
>>>> +    for (n = 0; n < udpif->n_handlers; ++n) {
>>>> +        handler = &udpif->handlers[n];
>>>> +        if (handler->n_new_upcalls) {
>>>> +            handler->n_new_upcalls = 0;
>>>> +            xpthread_cond_signal(&handler->wake_cond);
>>>> +        }
>>> 
>>> Usually there's a race if one signals a condition variable without
>>> holding the corresponding mutex.  Did you check that there is no race
>>> here?
>> 
>> I did not notice anything hanging or the like, if that's what you
>> mean. Documentation says:
> 
> One specific race that I was concerned about is:
> 
> 1. This code in udpif_miss_handler() checks n_upcalls and sees that it
>   is zero.
> 
>        if (!handler->n_upcalls) {
> 
> 2. This code in recv_upcalls() signals wake_cond:
> 
>    for (n = 0; n < udpif->n_handlers; ++n) {
>        handler = &udpif->handlers[n];
>        if (handler->n_new_upcalls) {
>            handler->n_new_upcalls = 0;
>            xpthread_cond_signal(&handler->wake_cond);
>        }
>    }
> 
> 3. This code in udpif_miss_handler() starts waiting on wake_cond:
> 
>            ovs_mutex_cond_wait(&handler->wake_cond, &handler->mutex);
> 
> Maybe this race cannot happen, because n_upcalls only changes with the
> mutex taken.  I guess that's the case.

It seems that the cond_signal can indeed happen before the corresponding 
cond_wait, but only if n_upcalls became zero due to the handler already running 
in parallel with the dispatcher and emptying the queue right before the 
dispatcher got to sending the wake signal. In this case the wait is the right 
thing to do as the wake-up signal was unnecessary.

Mutex is taken before new upcalls are pushed, so the handler would have to 
enter the cond_wait before the dispatcher can push new upcalls (and increment 
the n_upcalls). Hence no race.

>From the above I see that when both the dispatcher and the handler are running 
>in parallel, it usually is not necessary to signal the handler each time 
>FLOW_MISS_MAX_PATCH upcalls has been pushed. This incremental will signal only 
>if the actual number of upcalls in the list reaches the limit, or when no more 
>upcalls are incoming, and there is (was) something in the queue:

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 112c78d..0344dcb 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -566,13 +566,13 @@ recv_upcalls(struct udpif *udpif)
             ovs_mutex_lock(&handler->mutex);
             if (handler->n_upcalls < MAX_QUEUE_LENGTH) {
                 list_push_back(&handler->upcalls, &upcall->list_node);
-                handler->n_upcalls++;
+                handler->n_new_upcalls = ++handler->n_upcalls;
                 ovs_mutex_unlock(&handler->mutex);
 
-                if (++handler->n_new_upcalls >= FLOW_MISS_MAX_BATCH) {
-                    handler->n_new_upcalls = 0;
+                if (handler->n_new_upcalls >= FLOW_MISS_MAX_BATCH) {
                     xpthread_cond_signal(&handler->wake_cond);
                 }
+
                 if (!VLOG_DROP_DBG(&rl)) {
                     struct ds ds = DS_EMPTY_INITIALIZER;
 
@@ -590,12 +590,11 @@ recv_upcalls(struct udpif *udpif)
         } else {
             ovs_mutex_lock(&udpif->upcall_mutex);
             if (udpif->n_upcalls < MAX_QUEUE_LENGTH) {
-                udpif->n_upcalls++;
+                n_udpif_new_upcalls = ++udpif->n_upcalls;
                 list_push_back(&udpif->upcalls, &upcall->list_node);
                 ovs_mutex_unlock(&udpif->upcall_mutex);
 
-                if (++n_udpif_new_upcalls >= FLOW_MISS_MAX_BATCH) {
-                    n_udpif_new_upcalls = 0;
+                if (n_udpif_new_upcalls >= FLOW_MISS_MAX_BATCH) {
                     seq_change(udpif->wait_seq);
                 }
             } else {
@@ -613,7 +612,6 @@ recv_upcalls(struct udpif *udpif)
         }
     }
     if (n_udpif_new_upcalls) {
-        n_udpif_new_upcalls = 0;
         seq_change(udpif->wait_seq);
     }
 }

(The code you cited at 2. remains the same, right before the last hunk above.)

I will send v2 if you are happy with the analysis, and if the incremental did 
not introduce new problems that I can't immediately see.

  Jarno

> 
> Ethan, unless you see something, I'm happy with this.  (I'll be at
> VMworld tomorrow and probably not continuing the conversation.)
> 
> Acked-by: Ben Pfaff <b...@nicira.com>

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to