Re: Artificial performance limitation

2014-11-14 Thread Daniel van Vugt
No visible benefit because I'm testing on my Intel desktop, and even after offloading the responses to the IPC pool the compositor thread is still crippled by the Intel deferred batching problem. I have a fix for that too; just trying to make it less ugly. So I know how to fix the desktop perf

Re: Artificial performance limitation

2014-11-13 Thread Daniel van Vugt
Yes agreed; that's actually what I was saying in the previous email and I prototyped it yesterday. No visible benefit yet so I'm trying to find out why still... P.S. Surprisingly io_service is over-simplified and does not lend itself to dynamic thread pooling. Because there's no nice single en

Re: Artificial performance limitation

2014-11-13 Thread Alan Griffiths
I've not looked through your evidence, but you seem to be overlooking an option: Push the response logic to the BasicConnector::io_service - that uses epoll() to farm work out to the IPC thread pool (which, despite its default size of 1, still exists). Yes, there's a bit of wiring things together

Re: Artificial performance limitation

2014-11-12 Thread Daniel van Vugt
I think the cleanest solution would be a return to using the "IPC thread pool" (like in the old days). Presently we use it for receiving requests, but not for sending responses any more (unless frame dropping). The challenge is only to ensure we don't reintroduce the problems we used to have w

Re: Artificial performance limitation

2014-11-12 Thread Daniel van Vugt
Pretty pictures of a stuttering server attached. Notice the CompositingFunctor where ~vector destruction is consuming 30% of the time, due to ~TemporaryCompositorBuffer taking 27% of the time. I think we need to move the final buffer release/response logic out of the compositor thread one way

Re: Artificial performance limitation

2014-11-11 Thread Daniel van Vugt
Bugger. Come to think of it, that only proves the bug exists. Not what the cause is. Because blocking for a full frame is correct if the event loop is idle. And blocking for two frames only demonstrates the bug exists and we're not meeting the frame deadline... On 11/11/14 17:45, Daniel van V

Re: Artificial performance limitation

2014-11-11 Thread Daniel van Vugt
Found! Instrumenting AsioMainLoop [replace run() with a loop around run_once()] you see with one client you get one task executing per frame, which blocks the whole event loop (idlely not using CPU) for 16.6ms every frame: Run 0.016630977 sec Run 0.016647716 sec Run 0.016633778 sec ... Addit