Matthew,
In addition to Roland's comment about patching (which is particularly
important due to the rate of change in ZFS) ...
1) Your dd test doesn't really tell us much. Do you know what size
writes fwd is doing? I doubt it is 128k. You could use truss/dtrace to
find out and then use dd again with the same blocksize.
2) You've said the application is single threaded. This isn't a good fit
for the T-series (which was designed for throughput workloads at the
expense of single thread performance ... e.g. the M-series has typically
4x the single thread performance the T-series).
3) It would be helpful to see 'prstat -Lm' to get a better overall feel
for what is running where (especially, where fwd is spending its time).
4) It's cool to see you using DTrace. You've shown again that one-liners
can tell us so much, and so easily. My only comment is that it would be
useful to know how long your tests ran before you hit ^C.
5) How about some DTrace to find out where in the kernel the crosscalls
are happening? For example ...
dtrace -n 'sysinfo:::xcalls { @[stack()] = count(); }
... although bringing the system up-to-date with patches may resolve
this issue anyway.
Hope this helps,
Phil
Matthew Flanagan wrote:
Hi,
I'm trying to track down the source of a performance problem with an
application. The application in question is from a firewall vendor and the
component of it that is suffering is the log handling daemon. The log daemon is
single threaded and receives logs from firewalls and also services various
consumers which could be either simple log viewing CLI or GUI clients or log
reporting engines running on other servers.
The performance issue is seen from both the consumers and producers. Producers
complain that the log server is not responding and therefore unable to send
logs. The consumers don't complain (but the users do!) but are very slow when
reading the logs, filtering on certain fields etc.
I'm at a bit of a loss on where to go from here. Below is the analysis I've
done so far. I'm looking for advice on whether this can be solved by throwing
more resources at it or whether it is being limited by a poorly written
application. Please let me know if you'd like to see more output from dtrace or
other commands.
Note, one thing I have seen is that if I stop the reporting engine running on a
different server then the number of cross-calls drops significantly.
The log server specs are:
4 Core T5120 with 16GB RAM
2TB SAN storage (mirrored ZFS pool).
$ uname -a
SunOS logserver 5.10 Generic_137111-01 sun4v sparc SUNW,SPARC-Enterprise-T5120
Here is what I've looked at so far:
1. vmstat - seems ok to me.
# vmstat 3
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr m0 m1 m2 m1 in sy cs us sy id
0 0 0 15422904 700912 133 15 6 0 0 0 0 0 0 0 1 1618 15569 1774 3 3 94
0 0 0 15457416 730808 54 26 13 0 0 0 0 0 0 0 0 1679 18773 1913 3 3 94
0 0 0 15455944 729320 148 8 0 0 0 0 0 0 0 0 0 1073 19944 1000 4 3 93
0 0 0 15454744 728632 223 3 0 0 0 0 0 0 0 0 0 1864 22512 2104 4 3 93
0 0 0 15454232 729328 160 4 0 0 0 0 0 0 0 0 0 994 18935 997 3 3 94
0 0 0 15453400 730080 163 4 0 0 0 0 0 0 0 0 0 1777 18536 2088 3 3 94
0 0 0 15452920 730040 185 3 0 0 0 0 0 0 0 0 0 1950 21576 2198 4 3 93
0 0 0 15452416 730456 146 0 0 0 0 0 0 0 0 0 0 1033 20987 958 3 3 94
2. iostat - just looking at the fibre channel HBA throughput. Not really
pushing the SAN to its limits.
# iostat -xznC 1 | egrep c.$
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.5 1.0 5.7 14.1 0.0 0.0 0.0 15.5 0 1 c1
1.8 87.9 187.7 3028.6 0.0 1.5 0.2 16.5 0 26 c2
1.9 88.1 194.4 3066.4 0.0 1.5 0.2 16.5 0 26 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 521.4 0.0 15009.5 0.0 1.5 0.0 3.0 0 78 c2
0.0 518.4 0.0 15647.2 0.0 1.5 0.0 2.9 0 78 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
5.0 3.0 0.4 0.0 0.0 0.0 0.0 0.0 0 0 c2
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 4.0 0.0 2.0 0.0 0.0 0.0 4.9 0 2 c1
0.0 387.4 0.0 12853.2 0.0 1.0 0.0 2.6 0 59 c2
0.0 414.5 0.0 13446.9 0.0 1.0 0.0 2.5 0 57 c3
Running a `dd` of a 2GB log file shows how fast we can push the SAN
which peaks at 222 MBytes/s!
# time dd if=2009-06-28_224348_21.log of=/dev/null bs=128k
16368+1 records in
16368+1 records out
real 0m9.298s
user 0m0.094s
sys 0m4.138s
while dd is happening:
# iostat -xznC 1 | egrep c.$
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.5 1.0 5.7 14.0 0.0 0.0 0.0 15.5 0 1 c1
1.9 87.9 189.6 3028.4 0.0 1.5 0.2 16.5 0 26 c2
1.9 88.1 196.2 3066.3 0.0 1.5 0.2 16.4 0 26 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
2.1 0.0 263.5 0.0 0.0 0.0 0.0 8.4 0 2 c2
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
5.1 0.0 651.9 0.0 0.0 0.1 0.0 18.1 0 9 c2
5.1 0.0 651.9 0.0 0.0 0.3 0.0 64.2 0 33 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
5.0 3.0 0.4 0.0 0.0 0.0 0.0 0.0 0 0 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 0.0 4.9 0.0 0.0 0.0 0.0 0.2 0 0 c1
386.8 0.0 49513.3 0.0 0.0 20.0 0.0 51.8 0 678 c2
392.7 0.0 50269.1 0.0 0.0 20.0 0.0 51.0 0 667 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
869.9 3.0 110697.5 0.0 0.0 49.6 0.0 56.8 0 1389 c2
843.6 0.0 107982.0 0.0 0.0 49.6 0.0 58.8 0 1356 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
737.5 380.6 94397.2 16839.9 0.0 53.3 0.0 47.7 0 1347 c2
742.4 362.8 95025.5 15487.8 0.0 53.7 0.0 48.6 0 1366 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
884.6 0.0 113231.1 0.0 0.0 53.5 0.0 60.5 0 1547 c2
894.7 0.0 114516.7 0.0 0.0 53.7 0.0 60.0 0 1528 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
824.0 0.0 105467.2 0.0 0.0 50.3 0.0 61.1 0 1425 c2
831.9 0.0 106478.8 0.0 0.0 50.1 0.0 60.3 0 1439 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
837.2 0.0 107162.6 0.0 0.0 51.8 0.0 61.9 0 1472 c2
838.3 0.0 107299.9 0.0 0.0 51.5 0.0 61.4 0 1468 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
844.7 0.0 108124.9 0.0 0.0 51.9 0.0 61.4 0 1468 c2
842.7 0.0 107865.9 0.0 0.0 51.3 0.0 60.9 0 1464 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
866.4 154.9 110896.3 12598.9 0.0 60.8 0.0 59.6 0 1551 c2
857.4 390.7 109745.6 17811.3 0.0 59.2 0.0 47.5 0 1544 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 51.1 0.0 39.6 0.0 0.9 0.0 18.3 0 30 c1
830.5 0.0 106298.7 0.0 0.0 55.7 0.0 67.1 0 1511 c2
842.5 0.0 107839.9 0.0 0.0 54.9 0.0 65.2 0 1530 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 13.0 0.0 8.5 0.0 0.1 0.0 9.5 0 8 c1
355.1 0.0 45454.7 0.0 0.0 13.3 0.0 37.4 0 565 c2
318.0 0.0 40705.3 0.0 0.0 13.3 0.0 41.8 0 572 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 0.0 128.3 0.0 0.0 0.0 0.0 6.2 0 1 c2
5.0 3.0 0.4 0.0 0.0 0.0 0.0 0.0 0 0 c3
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 2.0 0.0 1.0 0.0 0.0 0.0 1.6 0 0 c1
0.0 482.1 0.0 14731.3 0.0 1.2 0.0 2.4 0 64 c2
0.0 428.1 0.0 15401.4 0.0 1.1 0.0 2.6 0 61 c3
I think I can safely eliminate the SAN as the bottleneck.
3. mpstat - the column that stands out to me is xcal. I've seen it peak up to
1.5 million xcalls on a single CPU!
# mpstat 2
[snip]
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 96318 337 150 194 3 50 40 7 640 7 12 0 81
1 801 0 338 87 0 144 4 33 171 0 889 61 31 0 8
2 0 0 7 36 0 45 0 16 28 1 543 3 3 0 94
3 0 0 1 15 1 12 0 6 3 0 0 0 0 0 100
4 0 0 50458 305 253 123 9 23 29 4 3513 27 13 0 60
5 0 0 223 215 176 89 2 18 70 0 109 7 1 0 91
6 0 0 10920 9 0 17 0 1 1 1 38 0 1 0 99
7 0 0 195 168 163 10 0 0 38 0 0 0 1 0 99
16 1 0 172 222 0 465 9 80 84 9 2324 22 4 0 73
17 0 0 24 83 1 157 5 16 35 0 870 15 1 0 84
18 0 0 23 52 0 108 0 3 28 0 0 0 0 0 100
19 0 0 1 35 0 69 0 0 0 0 0 0 0 0 100
20 9 0 31 6 0 0 5 0 1 0 17 100 0 0 0
21 0 0 10 26 0 49 0 10 27 0 60 0 0 0 100
22 0 0 0 2 1 0 0 0 0 0 0 0 0 0 100
23 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
24 83 0 592427 26 0 42 7 15 58 1 1922 32 46 0 23
25 1 0 212 50 0 98 1 23 21 3 949 5 2 0 93
26 0 0 1 8 0 16 0 7 8 0 2 0 0 0 100
27 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
28 0 0 117 53 0 105 5 17 24 7 1923 41 6 0 53
29 0 0 3 24 0 47 0 10 9 1 497 2 1 0 97
30 0 0 0 1 0 1 0 0 0 0 0 0 0 0 100
31 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
32 1 0 2254 119 0 251 5 57 102 6 2284 33 8 0 59
33 4 0 334 79 0 163 7 20 37 6 2007 13 6 0 81
34 0 0 3 5 0 9 0 2 2 0 0 0 0 0 100
35 0 0 3 9 0 17 0 0 0 0 0 0 0 0 100
36 0 0 259875 110 0 239 11 33 102 9 2727 10 33 0 57
37 0 0 4 13 0 24 0 5 3 0 282 5 0 0 95
38 0 0 3 37 0 76 0 0 0 0 0 0 0 0 100
39 0 0 0 1 0 1 0 0 0 0 0 0 0 0 100
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 7 0 1593 256 154 28 6 7 42 0 9074 48 25 0 27
1 0 0 2 13 0 23 0 7 0 0 30 0 1 0 99
2 0 0 0 3 0 4 0 0 0 0 80 0 0 0 100
3 0 0 0 2 1 0 0 0 0 0 0 0 0 0 100
4 0 0 523 429 411 124 4 6 24 0 1703 7 5 0 87
5 0 0 0 52 0 102 0 0 0 0 72 0 0 0 100
6 0 0 0 5 0 8 0 0 0 0 0 0 0 0 100
7 0 0 1 19 0 36 0 0 0 0 0 0 0 0 100
16 0 0 125980 44 0 79 7 13 12 0 3319 22 18 0 59
17 0 0 3 15 0 27 0 1 2 0 293 3 1 0 96
18 0 0 1 5 0 9 0 0 0 0 0 0 0 0 100
19 0 0 2 19 0 35 0 0 0 0 0 0 0 0 100
20 0 0 4 23 0 46 0 6 6 0 1254 6 1 0 93
21 0 0 0 2 0 2 0 0 0 0 0 0 0 0 100
22 0 0 0 2 1 0 0 0 0 0 0 0 0 0 100
23 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
24 3 0 94 49 0 89 4 12 20 0 3314 14 7 0 78
25 0 0 0 6 0 10 0 2 10 0 38 0 0 0 100
26 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
27 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
28 0 0 311 32 0 58 3 7 5 0 1156 8 5 0 86
29 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
30 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
31 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
32 1 0 407 53 0 104 4 16 27 1 3042 15 7 0 78
33 0 0 36 86 0 170 0 2 0 0 2 0 0 0 100
34 0 0 0 3 0 5 0 0 0 0 0 0 0 0 100
35 0 0 3 7 0 12 0 0 0 0 0 0 0 0 100
36 0 0 12 49 0 96 1 8 2 0 878 4 1 0 95
37 1 0 32 3 0 4 0 1 0 0 12 0 0 0 100
38 0 0 3 27 0 51 0 0 0 0 0 0 0 0 100
39 0 0 0 4 0 6 0 0 0 0 0 0 0 0 100
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 204 258 146 92 3 16 3 0 2692 7 6 0 87
1 0 0 4 26 0 38 2 7 10 0 1644 13 2 0 84
2 0 0 0 6 0 8 0 1 3 0 160 0 1 0 99
3 0 0 0 2 1 0 0 0 0 0 0 0 0 0 100
4 0 0 238 451 406 171 5 13 6 0 972 4 3 0 93
5 0 0 6 40 4 72 0 1 0 0 52 0 0 0 100
6 0 0 4 17 0 33 0 1 6 0 243 1 1 0 98
7 0 0 1 6 0 10 0 0 0 0 0 0 0 0 100
16 2 0 187838 42 0 72 12 5 37 0 5551 35 33 0 32
17 0 0 5 21 0 40 0 4 0 0 4 0 0 0 100
18 0 0 3 7 0 13 0 0 0 0 0 0 0 0 100
19 0 0 1 21 0 40 0 0 0 0 0 0 0 0 100
20 0 0 2 18 0 35 0 7 14 0 532 2 0 0 98
21 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
22 0 0 0 2 1 0 0 0 0 0 0 0 0 0 100
23 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
24 2 0 128 32 0 57 3 9 90 0 4981 33 8 0 59
25 0 0 1 4 0 7 0 2 0 0 39 0 0 0 100
26 0 0 0 1 0 2 0 1 0 0 0 0 0 0 100
27 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
28 0 0 94 26 0 47 1 5 76 0 1136 4 6 0 89
29 0 0 0 3 0 4 0 1 0 0 2 0 0 0 100
30 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
31 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100
32 0 0 561 38 0 66 5 12 16 0 3872 23 15 0 62
33 0 0 8 78 0 155 2 3 1 0 38 0 0 0 100
34 0 0 0 2 0 3 0 0 0 0 0 0 0 0 100
35 0 0 3 9 0 17 0 0 0 0 0 0 0 0 100
36 0 0 15 60 0 115 6 6 6 0 1457 8 2 0 90
37 0 0 1 4 0 7 0 0 0 0 0 0 0 0 100
38 0 0 5 28 0 53 0 0 0 0 0 0 0 0 100
39 0 0 1 4 0 6 0 0 0 0 0 0 0 0 100
4. who is making all the cross calls
# dtrace -n 'sysinfo:::xcalls { @num[pid,execname] = count(); }'
dtrace: description 'sysinfo:::xcalls ' matched 4 probes
^C
7 svc.startd 1
402 nscd 1
1384 snmpd 1
1759 cpd 1
1921 cpmad 1
[...snip...]
2164 fwd 26
1966 fwd 28
1927 fwd 31
2055 cpd 45
2144 fwd 45
826 snmpd 102
3 fsflush 190
21288 fwm 479
21289 fw_loader 495
21286 fwm 1093
21287 dtrace 1430
0 sched 26601
2153 fwd 337655
5. systems calls by this process.
# dtrace -n 'syscall:::entry /pid == 2153/ { @num[probefunc] = count(); }'
dtrace: description 'syscall:::entry ' matched 229 probes
^C
getpid 1
pollsys 74
pread64 87
pwrite64 134
lwp_sigmask 148
llseek 11010
lseek 18126
read 19654
write 57857
6. What is it reading & writing? This first column is the file descriptor
number, the second the number of times it was seen:
# dtrace -n 'syscall::read:entry /pid == 2153/ { @[arg0] = count(); }'
dtrace: description 'syscall::read:entry ' matched 1 probe
^C
47 1
49 12
83 12
73 17
51 18
50 21
53 24
56 24
57 282
58 381
55 401
78 5742
70 5777
77 12835
# dtrace -n 'syscall::write:entry /pid == 2153/ { @[arg0] = count(); }'
dtrace: description 'syscall::write:entry ' matched 1 probe
^C
13 4
82 7
78 158
70 160
24 21201
23 24974
22 28747
# pfiles 2153
2153: /opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/bin/fwd -n
Current rlimit: 65536 file descriptors
[...snip...]
22: S_IFREG mode:0644 dev:256,65541 ino:96177 uid:0 gid:0 size:1197740174
O_RDWR|O_CREAT|O_TRUNC
/var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/fw.log
23: S_IFREG mode:0644 dev:256,65541 ino:96178 uid:0 gid:0 size:47337100
O_RDWR|O_CREAT|O_TRUNC
/var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/fw.logptr
24: S_IFREG mode:0644 dev:256,65541 ino:96179 uid:0 gid:0 size:39200652
O_RDWR|O_CREAT|O_TRUNC
/var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/fw.loginitial_ptr
[...snip...]
70: S_IFSOCK mode:0666 dev:332,0 ino:25771 uid:0 gid:0 size:0
O_RDWR|O_NDELAY
SOCK_STREAM
SO_REUSEADDR,SO_OOBINLINE,SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
sockname: AF_INET 1.1.1.42 port: 18184
peername: AF_INET 1.1.1.5 port: 55011
[...snip...]
77: S_IFREG mode:0644 dev:256,65541 ino:125222 uid:0 gid:0 size:2145391731
O_RDONLY
/var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/2009-06-08_215011_153.log
78: S_IFSOCK mode:0666 dev:332,0 ino:44055 uid:0 gid:0 size:0
O_RDWR|O_NDELAY
SOCK_STREAM
SO_REUSEADDR,SO_OOBINLINE,SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
sockname: AF_INET 1.1.1.42 port: 18184
peername: AF_INET 1.1.1.5 port: 55147
[...snip...]
So the file being written is fw.log which is to be expected. But the reader is fd
77 2009-06-08_215011_153.log closely followed by fd’s 70 & 78 which are tcp
connections to log reporting server. So an educated guess would say that this is
the file that reporting server is trying to index. Checking the reporting server
confirms this.
regards
Matthew Flanagan
--
http://wadofstuff.blogspot.com
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org