On Wed, Jan 11, 2017 at 6:44 PM, Maxim Dounin <mdou...@mdounin.ru> wrote:
> Hello! > > On Wed, Jan 11, 2017 at 06:07:01PM +0300, Руслан Закиров wrote: > > > On Mon, Jan 9, 2017 at 10:59 PM, Maxim Dounin <mdou...@mdounin.ru> > wrote: > > > > > Typical kern.sched.quantum is about 100ms, so several > > > CPU-intensive tasks can delay processing of the events enough to > > > trigger a timeout if a context switch happens at a bad time. > > > > > > > Here what I see in truss' output: > > > > 38.820523207 0.000006568 kevent(28,{ },0,{ > > 198,EVFILT_WRITE,EV_CLEAR,0x0,0x8218,0x821405071 },512,{ 6.215000000 }) > = 1 > > (0x1) > > 39.783094188 0.000022875 kevent(28,{ },0,{ > > 52,EVFILT_READ,0x0,0x0,0x30b,0x81f800068 > > 204,EVFILT_WRITE,EV_CLEAR,0x0,0x8218,0x821401588 > > 51,EVFILT_READ,0x0,0x0,0xec,0x81f800000 > > 68,EVFILT_READ,EV_CLEAR,0x0,0x8bf,0x81f816580 > > 7,EVFILT_READ,EV_CLEAR,0x0,0x27f,0x81f813869 > > 57,EVFILT_READ,EV_CLEAR,0x0,0x767,0x81f817bd8 > > 203,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0x248,0x81f8030c1 > > 181,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0x9b77,0x81f80ea68 > > 178,EVFILT_READ,EV_CLEAR,0x0,0x39d,0x81f8010a9 > > 198,EVFILT_READ,EV_CLEAR,0x0,0x3d3,0x81f805071 > > 204,EVFILT_READ,EV_CLEAR,0x0,0x9da,0x81f801588 > > 190,EVFILT_READ,EV_CLEAR,0x0,0x4ff,0x81f80fc48 > > 154,EVFILT_READ,EV_CLEAR,0x0,0x88e,0x81f8130b1 > > 151,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0xc1db,0x81f814290 > > 157,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0xe841,0x81f80c029 > > 195,EVFILT_READ,EV_CLEAR,0x0,0x952,0x81f8090a1 > > 194,EVFILT_READ,EV_CLEAR,0x0,0x929,0x81f809ac8 > > 201,EVFILT_READ,EV_CLEAR,0x0,0x4ef,0x81f80c980 > > 174,EVFILT_READ,EV_CLEAR,0x0,0x51e,0x81f816518 > > 77,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0x1168,0x81f811c61 },512,{ > 5.253000000 > > }) = 20 (0x14) > > > > > > 1 second delay between two syscalls. Then nginx goes nuts processing all > it > > missed during this second. I can not tell from this output how much time > > was spent in these syscalls. Can anyone? > > Using ktrace / kdump might be a better option, it shows both > syscall enter and syscall return with exact timestamps. > Tried. Found at least one issue with sendfile blocking: 66193 nginx 1484152162.182601 CALL openat(AT_FDCWD,0x80639aac4,0x4<O_NONBLOCK>,<unused>0) 66193 nginx 1484152162.182607 NAMI ... 66193 nginx 1484152162.182643 RET openat 40/0x28 ... gz lookup and stats ... 66193 nginx 1484152162.182683 CALL setsockopt(0x211,0x6,0x4,0x7fffffffd96c,0x4) 66193 nginx 1484152162.182687 RET setsockopt 0 66193 nginx 1484152162.182689 CALL sendfile(0x28,0x211,0,0x806f,0x7fffffffe1d8,0x7fffffffe240,<invalid=0>) 66193 nginx 1484152163.541770 RET sendfile 0 Sendfile blocks for 1.3 seconds. However, it's: $ sysctl hw.model hw.machine hw.ncpu hw.model: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz hw.machine: amd64 hw.ncpu: 16 So sfbufs don't apply here, but documentation doesn't tell what resource applies... or I couldn't find correct doc. > What I don't like is timeout greater than 5 seconds. Doesn't it mean that > > system is allowed to block for timeout time to collect events? > > Timeout as passed to kevent() is the nearest timer set in nginx. > That is, nginx has nothing to do till the time specified, and > allows kernel to block that long if there are no events. Kernel > is expected to return events as soon as it has any, or return 0 if > the specified time limit expires. > This I figured from reading nginx source code. -- > Maxim Dounin > http://nginx.org/ > _______________________________________________ > nginx mailing list > nginx@nginx.org > http://mailman.nginx.org/mailman/listinfo/nginx > -- Руслан Закиров Руководитель отдела разработки веб-сервисов +7(916) 597-92-69, ruz @ <http://www.sports.ru/>
_______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx