On Thu, 2012-01-26 at 17:51 +0200, Lauri Kasanen wrote: > For this particular call, my app called XQueryPointer at > 1327592221.398620 and reported the call took 12177 usec. The relevant > portion of the strace is attached. > > I'm not fully confident in my ability to grok strace, but to me it > looks like the latter case (excess work?). There is a select call soon > after the call, and there's a write about 12ms later.
Okay, I think I know what this is. Right after your request there's: 1327592221.398755 select(256, [1 3 6 8 10 11 12 13 14 15 16 17 18 19 20 21 22], NULL, NULL, {7, 519000}) = 2 (in [20 22], left {7, 518990}) 1327592221.398868 setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0 1327592221.398944 read(20, "5\30\4\0\206_q\1\2\1\0\0\325\1\27\0007\0\4\0\207_q\1\206_q\1\0\0\0\0"..., 4096) = 3452 1327592221.399071 ioctl(8, 0xc020645d, 0x7fff61bdc420) = 0 1327592221.399189 ioctl(8, 0x40086409, 0x7fff61bdc440) = 0 1327592221.399279 ioctl(8, 0xc0086464, 0x7fff61bdc210) = 0 1327592221.400355 ioctl(8, 0xc0206466, 0x1f81a38) = 0 1327592221.401180 ioctl(8, 0xc008646a, 0x7fff61bdc460) = 0 1327592221.401255 ioctl(8, 0xc0086464, 0x7fff61bdc170) = 0 1327592221.401604 read(20, 0x2c06920, 4096) = -1 EAGAIN (Resource temporarily unavailable) 1327592221.401680 ioctl(8, 0xc0206466, 0x1f81a38) = 0 1327592221.401954 ioctl(8, 0xc008646a, 0x7fff61bdc880) = 0 1327592221.402024 writev(20, [{"\16\0\r\304\206_q\1\0\0>\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3808}], 1) = 3808 1327592221.402446 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 So this is showing a wakeup to handle a request on fd 20, but it completes in under 4ms (and isn't your request anyway; "5" is 53 decimal which is CreatePixmap). Much much later we finally get around to handling your request: 1327592221.410274 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 1327592221.410366 select(256, [1 3 6 8 10 11 12 13 14 15 16 17 18 19 20 21 22], NULL, NULL, {7, 507000}) = 1 (in [22], left {7, 506978}) 1327592221.410492 setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0 1327592221.410568 read(22, "&\0\2\0\2\1\0\0", 4120) = 8 1327592221.410651 read(22, 0x2c05900, 4120) = -1 EAGAIN (Resource temporarily unavailable) 1327592221.410721 writev(22, [{"\1\1'\1\0\0\0\0\2\1\0\0\211_`\0\330\1\243\0\330\1\243\0\0\0\0\0\0\0\0\0", 32}], 1) = 32 1327592221.410858 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 1327592221.410936 select(256, [1 3 6 8 10 11 12 13 14 15 16 17 18 19 20 21 22], NULL, NULL, {7, 507000}) = 1 (in [22], left {7, 506988}) 1327592221.411051 setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0 And this is definitely you, "&" is 38 decimal is QueryPointer. So: writev(22) returns: 1327592221.410721 Your printed timestamp: 1327592221.398620 ----------------------- ----------------- 0.012101 That plus another context switch back to your process comes to the 12177us you're seeing. So it's not that the server has excess work, just that it's ignoring you. Now the question, of course, is why we'd do such a thing, and the answer is to try to guarantee fairness among clients. The way we do this is by trying to time how long we're handling requests for a given client (that'd be the setitimer calls in the trace), and switching to a new client if one exhausts its timeslice. And now we see the problem: #define SMART_SCHEDULE_DEFAULT_INTERVAL 20 /* ms */ #define SMART_SCHEDULE_MAX_SLICE 200 /* ms */ Bool SmartScheduleDisable = FALSE; long SmartScheduleSlice = SMART_SCHEDULE_DEFAULT_INTERVAL; long SmartScheduleInterval = SMART_SCHEDULE_DEFAULT_INTERVAL; long SmartScheduleMaxSlice = SMART_SCHEDULE_MAX_SLICE; 20ms at a time for a single client is clearly insane. You can tune this on the command line with the -schedInterval <ms> argument; it would be interesting to see how things fare at closer to (say) 5ms. I suspect we need to tune that default down to below 16ms at least, allowing a single client to monopolize the server for more than a frame is not acceptable. I think a more complete solution would involve changing the main loop's logic slightly: if we get to the end of requests for a single client within a single timeslice, start pulling requests from any other clients that have pending requests (in priority order) until the timeslice expires (at which point we'd loop back around to select and start again). This would punish absolute throughput for any one client when contended, but I'm willing to take that if it means we stop dropping frames. - ajax
signature.asc
Description: This is a digitally signed message part
_______________________________________________ xorg@lists.x.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.x.org/mailman/listinfo/xorg Your subscription address: arch...@mail-archive.com