Hello, I have setup dtrace script which is run every 10 minutes: 0,10,20,30,40,50 * * * * /root/bin/sum.sh (the dtrace script is below). It runs for 5 min. and saves result in a file (so 5 out of 10 min is recorded) and records the following data: dtrace ioctl 6265410485 uname exece 6460327273 oracle semsys 7842510331 oracle munmap 9508009506 oracle resolvepath 9757628007 tnslsnr fork1 13248716831 emagent fork1 14051647077 oracle stat 15720295909 oracle exece 16459326038 oracle pollsys 19301262569 oracle mmap 20975464662 oracle times 23059663461 oracle pwrite 25491331888 oracle open 30196916271 oracle memcntl 31122380062 oracle pread 40277198389 oracle read 64166456399 oracle write 67988846272 oracle ioctl 180151590889
TOTAL: 770149631798 When I grep for TOTAL in these log files I get: 2012-07-10-13:00: TOTAL: 900883959215 2012-07-10-13:10: TOTAL: 824011512465 2012-07-10-13:20: TOTAL: 839473060028 2012-07-10-13:30: TOTAL: 880047194365 2012-07-10-13:40: TOTAL: 5236046549 2012-07-10-13:50: TOTAL: 949408615917 2012-07-10-14:00: TOTAL: 957321356582 2012-07-10-14:10: TOTAL: 805430788624 2012-07-10-14:20: TOTAL: 678869217982 2012-07-10-14:30: TOTAL: 710839026697 2012-07-10-14:40: TOTAL: 816557964466 2012-07-10-14:50: TOTAL: 880681668064 2012-07-10-15:00: TOTAL: 986025103759 While running 'sar' for the _same_ time period it says: 13:00:04 38 10 0 52 13:10:04 36 10 0 54 13:20:04 36 11 0 53 13:30:03 35 12 0 53 13:40:04 34 13 0 53 13:50:04 35 20 0 45 14:00:04 36 23 0 41 14:10:04 29 9 0 62 14:20:04 30 9 0 61 14:30:03 26 8 0 67 14:40:04 27 8 0 65 14:50:04 32 9 0 59 15:00:04 31 9 0 60 So 'sar' data shows increase in syscall time while dtrace data doesn't. Can anybody explain me that ? Of course it might be that something is happening in theses five minutes period which is not recorded by dtrace. But this is typical OLTP database server with many Oracles running on it and such behaviour is rather unrealistic ... Best regards Przemek On Fri, Jul 06, 2012 at 11:44:39PM +0200, przemol...@poczta.fm wrote: > "Mike Gerdts" <mger...@gmail.com> pisze: > > On Fri, Jul 6, 2012 at 3:22 PM, wrote: > > > Hello, > > > > > > we have been observing some high syscalls (much higher then usually) > > > during fixed part of week. > > > DTrace shows that they mostly comes from Oracle: > > > ... > > > uname exece > > > 6788165162 > > > oracle close > > > 6904279356 > > > dtrace ioctl > > > 7987232206 > > > oracle munmap > > > 13857698724 > > > emagent fork1 > > > 14217339236 > > > oracle resolvepath > > > 14438549232 > > > oracle semsys > > > 16447265715 > > > tnslsnr fork1 > > > 18617374490 > > > oracle stat > > > 23129632013 > > > oracle exece > > > 23859088254 > > > oracle pollsys > > > 28102955920 > > > oracle pread > > > 32159134471 > > > oracle mmap > > > 35380305339 > > > oracle times > > > 35475706300 > > > oracle open > > > 42314061704 > > > oracle memcntl > > > 45026502910 > > > oracle pwrite > > > 51648592719 > > > oracle read > > > 72939284043 > > > oracle write > > > 95484269587 > > > oracle ioctl > > > 306634639800 > > > > > > TOTAL: > > > 1060934672587 > > > > What was the dtrace script you used and over what period? > > #!/usr/sbin/dtrace -s > > #pragma D option dynvarsize=64m > > syscall:::entry > { > self->start = vtimestamp; > } > syscall:::return > /(int)arg0 != -1 && self->start/ > { > this->time = vtimestamp - self->start; > @Time[execname,probefunc] = sum(this->time); > @TimeT["Total:"] = sum(this->time); > self->start = 0; > } > profile:::tick-5m > { > trunc (@Time, 20); > trunc (@TimeT, 20); > printa(@Time); > printa("\n TOTAL:%113@d", @TimeT); > clear(@Time); > clear(@TimeT); > exit(0); > } > END > { > clear (@Time); > clear (@TimeT); > } > > > > > > How many > > cpus in the system (according to psrinfo | wc -l)? > > 96 > This is T2+ based server. > > > > > Those numbers look > > suspiciously high and make me think something is wrong with the dtrace > > script. > > > > > > > > Using sar we can observe that system spends much more time in syscalls > > > (column 'sys') but number of syscalls is not higher then usually (column > > > 'scall/s'): > > > > > > server:/export/home/przemol>sar -f /var/adm/sa/sa04 -s 12:00 -e 15:30 > > > > > > SunOS server 5.10 Generic_147440-19 sun4v 07/04/2012 > > > > > > 12:00:03 %usr %sys %wio %idle > > > 12:10:03 42 8 0 50 > > > 12:20:03 37 7 0 56 > > > 12:30:03 37 8 0 55 > > > 12:40:03 37 10 0 53 > > > 12:50:04 40 15 0 45 > > > 13:00:03 37 10 0 52 > > > 13:10:04 35 10 0 55 > > > 13:20:04 38 8 0 55 > > > 13:30:04 35 7 0 58 > > > 13:40:04 38 9 0 53 > > > 13:50:03 37 21 0 42 > > > 14:00:03 36 14 0 50 > > > 14:10:04 33 10 0 57 > > > 14:20:04 36 21 0 42 > > > 14:30:04 36 32 0 32 > > > 14:40:04 28 15 0 57 > > > 14:50:03 32 12 0 57 > > > 15:00:03 30 12 0 57 > > > 15:10:03 32 8 0 60 > > > 15:20:03 31 6 0 64 > > > > > > Average 35 12 0 52 > > > > > > server:/export/home/przemol>sar -c -f /var/adm/sa/sa04 -s 12:00 -e 15:30 > > > > > > SunOS server 5.10 Generic_147440-19 sun4v 07/04/2012 > > > > > > 12:00:03 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s > > > 12:10:03 107602 6567 4948 27.82 18.74 478579 4290812 > > > 12:20:03 96872 6411 4549 25.64 17.55 6949437 4295743 > > > 12:30:03 99273 6139 4792 25.69 17.76 723126 895552 > > > 12:40:03 95908 5950 4129 25.47 17.61 4020240 3585954 > > > 12:50:04 99792 5685 5319 24.52 16.69 4579288 4915466 > > > 13:00:03 90723 4927 3553 24.85 17.34 4490451 3330185 > > > 13:10:04 94744 6207 3688 25.22 17.40 6691959 3530345 > > > 13:20:04 93971 6105 4111 24.88 17.05 1220270 4051683 > > > 13:30:04 87887 6189 4418 24.45 17.15 3026237 4946326 > > > 13:40:04 92151 6032 4269 24.58 17.12 4982878 3595013 > > > 13:50:03 80146 5360 4101 23.22 16.10 448459 4052298 > > > 14:00:03 72998 4488 2142 23.69 16.69 4678268 3586545 > > > 14:10:04 76433 4031 2385 24.82 17.31 6913550 6157899 > > > 14:20:04 72935 4462 1994 23.49 16.25 3090017 5321098 > > > 14:30:04 72516 4124 1993 23.46 16.37 2942463 6993712 > > > 14:40:04 72057 3048 2344 23.95 16.77 3042674 2668180 > > > 14:50:03 73552 3370 2506 24.93 17.13 6398090 2906262 > > > 15:00:03 70869 3111 2049 25.39 17.55 5858628 3745978 > > > 15:10:03 80258 3643 2467 47.57 28.89 7126361 3621600 > > > 15:20:03 73211 3519 2693 24.66 16.99 4550096 2973429 > > > > > > Average 85197 4969 3423 25.91 17.72 4110499 3973336 > > > > It's averaging 85197 system calls per second. At that rate, 1 > > trillion system calls (1,060,934,672,587) would take around 144 CPU > > days of system time. Seems kinda unlikely. > > > > Perhaps the 1 trillion number is nanoseconds spent in the system > > calls. That would turn into about 29 minutes of system time. That > > could be, but whether it is bad depends a lot on the system and the > > duration. > > > > > > > > Can I conclude that system spends much more time processing more or less > > > the same number > > > of syscalls and time spent on one (statistically) syscall takes longer > > > time ? > > > What is your opinion ? > > > > > > Regards > > > Przemek > > > _______________________________________________ > > > perf-discuss mailing list > > > perf-discuss@opensolaris.org > > > > > > > > -- > > Mike Gerdts > > http://mgerdts.blogspot.com/ > > > > > _______________________________________________ > perf-discuss mailing list > perf-discuss@opensolaris.org -- Regards Przemyslaw Bak (przemol) -- http://przemol.blogspot.com/ _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org