Hi Jason,
I did the testing and wrote for it a blog article for this:
https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
But in summary is following:
Test case:
- One multi-threaded binary with 10 threads are doing total of 1'000'000
calls to 250 single threaded processes doing epoll() on the Posix queue
- The 'call' are basically sending a message to shared queue (to those
250 load balanced processed) and they send reply back to client thread's
private queue
Tests done on following system:
- Host system: Linux Mint Mate 17.2 64bit, kernel: 3.13.0-24-generic
- CPU: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz (two cores)
- RAM: 16 GB
- Visualization platform: Oracle Virtual Box 4.3.28
- Guest OS: Gentoo Linux 2015.03, kernel 4.3.0-gentoo, 64 bit.
- CPU for guest: Two cores
- RAM for guest: 5GB (no swap usage, free about 4GB)
- Enduro/X version: 2.3.2
Results with original kernel (no EPOLLEXCLUSIVE):
Gives:
$ time ./bankcl
...
real 14m20.561s
user 0m21.823s
sys 10m49.821s
Patched kernel version with EPOLLEXCLUSIVE flag in use:
$ time ./bankcl
...
real 0m24.953s
user 0m17.497s
sys 0m4.445s
Thus 14 minutes vs 24 seconds! So EPOLLEXCLUSIVE flag makes application
to run *35 times faster*!
Guys this is MUST HAVE patch!
Thanks,
Madars
Jason Baron @ 2015-12-01 22:11 rakstīja:
Hi Madars,
On 11/30/2015 04:28 PM, Madars Vitolins wrote:
Hi Jason,
I today did search the mail archive and checked your offered patch did
on February, it basically does the some (flag for
add_wait_queue_exclusive() + balance).
So I plan to run off some tests with your patch, flag on/off and will
provide results. I guess if I pull up 250 or 500 processes (which
could real for production environment) waiting on one Q, then there
could be a notable difference in performance with EPOLLEXCLUSIVE set
or not.
Sounds good. Below is an updated patch if you want to try it - it only
adds the 'EPOLLEXCLUSIVE' flag.
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e009ca..265fa7b 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -92,7 +92,7 @@
*/
/* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET |
EPOLLEXCLUSIVE)
/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4
@@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait,
unsigned mode, int sync, void *k
unsigned long flags;
struct epitem *epi = ep_item_from_wait(wait);
struct eventpoll *ep = epi->ep;
+ int ewake = 0;
if ((unsigned long)key & POLLFREE) {
ep_pwq_from_wait(wait)->whead = NULL;
@@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait,
unsigned mode, int sync, void *k
* Wake up ( if active ) both the eventpoll wait list and the
->poll()
* wait list.
*/
- if (waitqueue_active(&ep->wq))
+ if (waitqueue_active(&ep->wq)) {
+ ewake = 1;
wake_up_locked(&ep->wq);
+ }
if (waitqueue_active(&ep->poll_wait))
pwake++;
@@ -1078,6 +1081,9 @@ out_unlock:
if (pwake)
ep_poll_safewake(&ep->poll_wait);
+ if (epi->event.events & EPOLLEXCLUSIVE)
+ return ewake;
+
return 1;
}
@@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file
*file, wait_queue_head_t *whead,
init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
pwq->whead = whead;
pwq->base = epi;
- add_wait_queue(whead, &pwq->wait);
+ if (epi->event.events & EPOLLEXCLUSIVE)
+ add_wait_queue_exclusive(whead, &pwq->wait);
+ else
+ add_wait_queue(whead, &pwq->wait);
list_add_tail(&pwq->llink, &epi->pwqlist);
epi->nwait++;
} else {
@@ -1861,6 +1870,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op,
int, fd,
if (f.file == tf.file || !is_file_epoll(f.file))
goto error_tgt_fput;
+ if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
+ (op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
+ goto error_tgt_fput;
+
/*
* At this point it is safe to assume that the "private_data"
contains
* our own data structure.
diff --git a/include/uapi/linux/eventpoll.h
b/include/uapi/linux/eventpoll.h
index bc81fb2..925bbfb 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -26,6 +26,9 @@
#define EPOLL_CTL_DEL 2
#define EPOLL_CTL_MOD 3
+/* Add exclusively */
+#define EPOLLEXCLUSIVE (1 << 28)
+
/*
* Request the handling of system wakeup events so as to prevent
system suspends
* from happening while those events are being processed.
During kernel hacking with debug print, with 10 processes waiting on
one event source, with original kernel I did see lot un-needed
processing inside of eventpoll.c, it got 10x calls to
ep_poll_callback() and other stuff for single event, which results
with few processes waken up in user space (count probably gets
randomly depending on concurrency).
Meanwhile we are not the only ones who talk about this patch, see
here:
http://stackoverflow.com/questions/33226842/epollexclusive-and-epollroundrobin-flags-in-mainstream-kernel
others are asking too.
So what is the current situation with your patch, what is the blocking
for getting it into mainline?
If we can show some good test results here I will re-submit it.
Thanks,
-Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/