Hi Madars,

On 11/30/2015 04:28 PM, Madars Vitolins wrote:
> Hi Jason,
> 
> I today did search the mail archive and checked your offered patch did on 
> February, it basically does the some (flag for add_wait_queue_exclusive() + 
> balance).
> 
> So I plan to run off some tests with your patch, flag on/off and will provide 
> results. I guess if I pull up 250 or 500 processes (which could real for 
> production environment) waiting on one Q, then there could be a notable 
> difference in performance with EPOLLEXCLUSIVE set or not.
> 

Sounds good. Below is an updated patch if you want to try it - it only adds the 
'EPOLLEXCLUSIVE' flag.


diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e009ca..265fa7b 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -92,7 +92,7 @@
  */
 
 /* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET | EPOLLEXCLUSIVE)
 
 /* Maximum number of nesting allowed inside epoll sets */
 #define EP_MAX_NESTS 4
@@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned 
mode, int sync, void *k
        unsigned long flags;
        struct epitem *epi = ep_item_from_wait(wait);
        struct eventpoll *ep = epi->ep;
+       int ewake = 0;
 
        if ((unsigned long)key & POLLFREE) {
                ep_pwq_from_wait(wait)->whead = NULL;
@@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned 
mode, int sync, void *k
         * Wake up ( if active ) both the eventpoll wait list and the ->poll()
         * wait list.
         */
-       if (waitqueue_active(&ep->wq))
+       if (waitqueue_active(&ep->wq)) {
+               ewake = 1;
                wake_up_locked(&ep->wq);
+       }
        if (waitqueue_active(&ep->poll_wait))
                pwake++;
 
@@ -1078,6 +1081,9 @@ out_unlock:
        if (pwake)
                ep_poll_safewake(&ep->poll_wait);
 
+       if (epi->event.events & EPOLLEXCLUSIVE)
+               return ewake;
+
        return 1;
 }
 
@@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file *file, 
wait_queue_head_t *whead,
                init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
                pwq->whead = whead;
                pwq->base = epi;
-               add_wait_queue(whead, &pwq->wait);
+               if (epi->event.events & EPOLLEXCLUSIVE)
+                       add_wait_queue_exclusive(whead, &pwq->wait);
+               else
+                       add_wait_queue(whead, &pwq->wait);
                list_add_tail(&pwq->llink, &epi->pwqlist);
                epi->nwait++;
        } else {
@@ -1861,6 +1870,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
        if (f.file == tf.file || !is_file_epoll(f.file))
                goto error_tgt_fput;
 
+       if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
+               (op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
+               goto error_tgt_fput;
+
        /*
         * At this point it is safe to assume that the "private_data" contains
         * our own data structure.
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index bc81fb2..925bbfb 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -26,6 +26,9 @@
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
 
+/* Add exclusively */
+#define EPOLLEXCLUSIVE (1 << 28)
+
 /*
  * Request the handling of system wakeup events so as to prevent system 
suspends
  * from happening while those events are being processed.


> During kernel hacking with debug print, with 10 processes waiting on one 
> event source, with original kernel I did see lot un-needed processing inside 
> of eventpoll.c, it got 10x calls to ep_poll_callback() and other stuff for 
> single event, which results with few processes waken up in user space (count 
> probably gets randomly depending on concurrency).
> 
> 
> Meanwhile we are not the only ones who talk about this patch, see here: 
> http://stackoverflow.com/questions/33226842/epollexclusive-and-epollroundrobin-flags-in-mainstream-kernel
>  others are asking too.
> 
> So what is the current situation with your patch, what is the blocking for 
> getting it into mainline?
> 

If we can show some good test results here I will re-submit it.

Thanks,

-Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to