Hi Phil > Matthew, > > In addition to Roland's comment about patching (which > is particularly > important due to the rate of change in ZFS) ... > > 1) Your dd test doesn't really tell us much. Do you > know what size > writes fwd is doing? I doubt it is 128k. You could > use truss/dtrace to > find out and then use dd again with the same > blocksize. >
truss shows writes like this: /1: write(22, "\0 4\001 :919C\v J H90 b".., 96) = 96 /1: write(23, " %97 *AF", 4) = 4 /1: write(24, " %97 *AF", 4) = 4 and reads like this: /1: read(77, "\0 D\003 :919F16 J ,EACF".., 131072) = 131072 /1: read(77, "\0 D\003 :919F16 J ,EADC".., 131072) = 131072 /1: read(77, "\0 D\003 :919F16 J ,EADC".., 131072) = 131072 /1: read(78, 0x033BD918, 5) Err#11 EAGAIN /1: read(78, 0x033BD918, 5) Err#11 EAGAIN /1: read(78, 0x033BD918, 5) Err#11 EAGAIN /1: read(78, 0x033BD918, 5) Err#11 EAGAIN I had done a truss before and seen the 128k reads. Though I hadn't noticed so many EAGAIN's. > 2) You've said the application is single threaded. > This isn't a good fit > for the T-series (which was designed for throughput > workloads at the > expense of single thread performance ... e.g. the > M-series has typically > 4x the single thread performance the T-series). > The server has handled the workload quite well until recently when we've added a few new firewalls that do quite a large volume of logging. If I can build a case for a faster server then the easier it is to convince the business to buy it. > 3) It would be helpful to see 'prstat -Lm' to get a > better overall feel > for what is running where (especially, where fwd is > spending its time). > PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 2153 root 41 59 0.1 0.0 0.0 0.0 0.3 0.2 383 11 7K 0 fwd/1 1923 root 16 5.4 0.1 0.0 0.0 0.0 78 0.1 52 18 2K 0 fwd/1 2144 root 12 1.3 0.0 0.0 0.0 0.0 87 0.0 10 4 503 0 fwd/1 1927 root 5.4 4.2 0.0 0.0 0.0 0.0 90 0.0 51 1 1K 0 fwd/1 1941 root 2.5 4.3 0.0 0.0 0.0 0.0 93 0.0 29 0 855 0 fwd/1 1975 root 6.4 0.4 0.0 0.0 0.0 0.0 93 0.0 1 2 936 0 cpmad/1 2166 root 5.5 0.1 0.0 0.0 0.0 0.0 94 0.0 5 1 36 0 fwm/1 22500 root 4.8 0.6 0.0 0.0 0.0 0.0 95 0.0 1 2 1K 0 cpmad/1 23745 root 0.7 4.6 0.0 0.0 0.0 0.0 95 0.0 42 1 603 0 prstat/1 1966 root 2.8 2.1 0.0 0.0 0.0 0.0 95 0.0 12 1 996 0 fwd/1 2250 root 3.6 0.1 0.0 0.0 0.0 0.0 96 0.0 4 0 36 0 fwm/1 2175 root 1.7 0.9 0.0 0.0 0.0 0.0 97 0.0 19 0 404 0 fwd/1 21873 root 2.3 0.1 0.0 0.0 0.0 0.0 98 0.0 1 0 396 0 cpmad/1 23523 root 1.8 0.2 0.0 0.0 0.0 0.0 98 0.0 1 1 423 0 cpmad/1 2164 root 1.2 0.5 0.0 0.0 0.0 0.0 98 0.0 5 0 320 0 fwd/1 1902 root 0.4 0.8 0.0 0.0 0.0 0.0 99 0.0 1 0 159 0 fwd/1 402 root 0.2 0.1 0.0 0.0 0.0 0.0 100 0.0 39 0 234 0 nscd/25 1901 root 0.2 0.1 0.0 0.0 0.0 0.0 100 0.0 7 0 201 0 fwd/1 1957 root 0.1 0.1 0.0 0.0 0.0 0.0 100 0.0 1 0 21 0 fwd/1 23657 mflanaga 0.1 0.0 0.0 0.0 0.0 0.0 100 0.0 3 2 24 0 sshd/1 1747 root 0.1 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 20 0 cpd/1 1934 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 3 0 90 0 fwd/1 2137 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 18 0 fwd/1 2157 root 0.1 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 8 0 fwm/1 2145 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 9 0 fwd/1 1921 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 5 0 cpmad/1 2055 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 12 0 cpd/1 1960 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 7 0 fwm/1 1944 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 7 0 fwm/1 1917 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 8 0 fwd/1 522 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 10 2 xntpd/1 1968 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 7 0 fwm/1 2147 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 2 0 4 0 fwm/1 2056 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 3 0 5 0 cpd/2 1757 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 3 0 5 0 cpd/2 1747 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 3 0 5 0 cpd/2 1759 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 3 0 5 0 cpd/2 1745 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 3 0 5 0 cpd/2 2039 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 3 0 5 0 cpd/2 Total: 115 processes, 412 lwps, load averages: 2.07, 2.05, 2.05 > 4) It's cool to see you using DTrace. You've shown > again that one-liners > can tell us so much, and so easily. My only comment > is that it would be > useful to know how long your tests ran before you hit > ^C. > I usually ran each dtrace for 20 seconds. > 5) How about some DTrace to find out where in the > kernel the crosscalls > are happening? For example ... > > dtrace -n 'sysinfo:::xcalls { @[stack()] = > count(); } > ... although bringing the system up-to-date with > patches may resolve > this issue anyway. # dtrace -n 'sysinfo:::xcalls { @[stack()] = count(); }' [...snip...] unix`send_one_mondo+0x1c unix`xt_one_unchecked+0xc8 genunix`sleepq_wakeone_chan+0x70 genunix`cv_signal+0x58 genunix`taskq_dispatch+0x94 genunix`callout_schedule+0x48 genunix`clock+0x6a0 genunix`cyclic_softint+0xa4 unix`cbe_level10+0x8 unix`intr_thread+0x168 2068 unix`send_mondo_set+0x314 unix`xt_some+0x1a0 unix`sfmmu_tlb_range_demap+0x1f0 unix`hat_unload_callback+0x850 unix`segkmem_free_vn+0x58 genunix`kmem_magazine_destroy+0xd8 genunix`kmem_depot_ws_reap+0x50 genunix`taskq_thread+0x1a4 unix`thread_start+0x4 2181 unix`send_mondo_set+0x314 unix`xt_some+0x1a0 unix`xt_sync+0x158 unix`hat_unload_callback+0x898 unix`segkmem_free_vn+0x58 genunix`kmem_magazine_destroy+0xd8 genunix`kmem_depot_ws_reap+0x50 genunix`taskq_thread+0x1a4 unix`thread_start+0x4 2181 unix`send_one_mondo+0x1c unix`xt_one_unchecked+0xc8 genunix`sleepq_wakeone_chan+0x70 genunix`cv_signal+0x58 dls`soft_ring_process+0x214 dls`dls_soft_ring_fanout+0x2bc dls`i_dls_link_rx+0x1f0 mac`mac_rx+0x8c e1000g`e1000g_intr_pciexpress+0xbc px`px_msiq_intr+0x1e8 unix`intr_thread+0x168 2841 unix`send_one_mondo+0x1c unix`xt_one_unchecked+0xc8 genunix`sleepq_wakeone_chan+0x70 genunix`cv_signal+0x58 genunix`taskq_dispatch+0x94 zfs`zio_done+0x190 zfs`zio_vdev_io_assess+0x178 genunix`taskq_thread+0x1a4 unix`thread_start+0x4 3498 unix`send_one_mondo+0x1c unix`xt_one_unchecked+0xc8 genunix`sleepq_wakeone_chan+0x70 genunix`cv_signal+0x58 genunix`taskq_dispatch+0x94 emcp`PowerPlatformTopIodone+0x60 emcp`PowerTopIodone+0xc emcp`PowerProcessTopIodonePirps+0x2c emcp`PowerBottomIodoneNew+0x308 emcp`PowerPlatformBottomIodone+0x2c ssd`ssd_return_command+0x240 ssd`ssdintr+0x268 qlc`ql_fast_fcp_post+0x178 qlc`ql_24xx_status_entry+0x2ec qlc`ql_response_pkt+0x268 qlc`ql_isr_aif+0x7dc px`px_msiq_intr+0x1e8 unix`intr_thread+0x168 5017 unix`send_mondo_set+0x314 unix`xt_some+0x1a0 unix`sfmmu_tlb_range_demap+0x1f0 unix`hat_unload_callback+0x850 unix`ppmapout+0x94 zfs`mappedread+0xa4 zfs`zfs_read+0x19c genunix`fop_read+0x20 genunix`read+0x274 unix`syscall_trap32+0xcc 135480 unix`send_mondo_set+0x314 unix`xt_some+0x1a0 unix`xt_sync+0x158 unix`hat_unload_callback+0x898 unix`ppmapout+0x94 zfs`mappedread+0xa4 zfs`zfs_read+0x19c genunix`fop_read+0x20 genunix`read+0x274 unix`syscall_trap32+0xcc 135480 I've also done the same dtrace but with ustack() just for the process. # dtrace -n 'sysinfo:::xcalls /pid == 2153/ { @[ustack()] = count(); }' [snip] libc_psr.so.1`memcpy+0x18c libCPLogRepository.so`__1cICLogFilePReadLogByOffset6MpnJILogEvent_LpL3_i_+0x118 libCPLogRepository.so`__1cICLogFileHReadLog6MpnJILogEvent_LpL33_i_+0xd4 libCPLogRepository.so`__1cICLogFileIReadNext6MpnJILogEvent_pL33_i_+0x40 libCPLogLuFile.so`__1cKCLuLogFileLgatherChain6Mn0APtReadNavigation_kLrnKCSrvVector4CpnJILogEvent_nDstdTallocator_interface4n0DJallocator4C4__C4____r4_i_+0x164 libCPLogLuFile.so`__1cKCLuLogFileJdoReadLog6MpnJILogEvent_n0APtReadNavigation_kLipp2pi_i_+0x80 libCPLogLuFile.so`0xfc493e9c libfw1.so`__1cNlog_read_next6FpnPICplogLuLogFile_pnJILogEvent_bpp3pi6_3_+0xfc libfwdopsec.so`lea_send_line+0x108 libopsec.so`lea_send_50_records+0x68 libopsec.so`lea_server_event_handler+0xc8 libopsec.so`opsec_fwasync_conn_handler+0x2fc libComUtils.so`0xfe393bc4 libComUtils.so`T_event_mainloop_iter+0x4c4 libComUtils.so`T_event_mainloop+0x118 libfw1.so`fwd_cmain+0x131c libfw1.so`fw_cmain+0x910 fwd`_start+0x108 18 libc_psr.so.1`memcpy+0x18c libCPLogRepository.so`__1cOCBinaryLogFileHReadLog6MpnJILogEvent_pnNCLogFileChain__b_+0xc libCPLogRepository.so`__1cICLogFilePReadLogByOffset6MpnJILogEvent_LpL3_i_+0x118 libCPLogRepository.so`__1cICLogFileHReadLog6MpnJILogEvent_LpL33_i_+0xd4 libCPLogRepository.so`__1cICLogFileIReadNext6MpnJILogEvent_pL33_i_+0x40 libCPLogLuFile.so`__1cKCLuLogFileLgatherChain6Mn0APtReadNavigation_kLrnKCSrvVector4CpnJILogEvent_nDstdTallocator_interface4n0DJallocator4C4__C4____r4_i_+0x164 libCPLogLuFile.so`__1cKCLuLogFileJdoReadLog6MpnJILogEvent_n0APtReadNavigation_kLipp2pi_i_+0x80 0x3 32 libc_psr.so.1`memcpy+0x18c libCPLogRepository.so`__1cOCBinaryLogFileHReadLog6MpnJILogEvent_pnNCLogFileChain__b_+0xc libCPLogRepository.so`__1cICLogFilePReadLogByOffset6MpnJILogEvent_LpL3_i_+0x118 libCPLogLuFile.so`__1cKCLuLogFileLgatherChain6Mn0APtReadNavigation_kLrnKCSrvVector4CpnJILogEvent_nDstdTallocator_interface4n0DJallocator4C4__C4____r4_i_+0x654 libCPLogLuFile.so`__1cKCLuLogFileJdoReadLog6MpnJILogEvent_n0APtReadNavigation_kLipp2pi_i_+0x80 libCPLogLuFile.so`0xfc493e9c libfw1.so`__1cNlog_read_next6FpnPICplogLuLogFile_pnJILogEvent_bpp3pi6_3_+0xfc libfwdopsec.so`lea_send_line+0x108 libopsec.so`lea_send_50_records+0x68 libopsec.so`lea_server_event_handler+0xc8 libopsec.so`opsec_fwasync_conn_handler+0x2fc libComUtils.so`0xfe393bc4 libComUtils.so`T_event_mainloop_iter+0x4c4 libComUtils.so`T_event_mainloop+0x118 libfw1.so`fwd_cmain+0x131c libfw1.so`fw_cmain+0x910 fwd`_start+0x108 62 libc.so.1`_read+0x8 libc.so.1`fread+0x298 libCPLogRepository.so`__1cOCBinaryLogFileNReadLogHeader6kMpnJILogEvent_pnNCLogFileChain__b_+0x280 libCPLogRepository.so`__1cOCBinaryLogFileHReadLog6MpnJILogEvent_pnNCLogFileChain__b_+0xc libCPLogRepository.so`__1cICLogFilePReadLogByOffset6MpnJILogEvent_LpL3_i_+0x118 libCPLogLuFile.so`__1cKCLuLogFileLgatherChain6Mn0APtReadNavigation_kLrnKCSrvVector4CpnJILogEvent_nDstdTallocator_interface4n0DJallocator4C4__C4____r4_i_+0x3f8 libCPLogLuFile.so`__1cKCLuLogFileJdoReadLog6MpnJILogEvent_n0APtReadNavigation_kLipp2pi_i_+0x80 libCPLogLuFile.so`0xfc493e9c libfw1.so`__1cNlog_read_next6FpnPICplogLuLogFile_pnJILogEvent_bpp3pi6_3_+0xfc libfwdopsec.so`lea_send_line+0x108 libopsec.so`lea_send_50_records+0x68 libopsec.so`lea_server_event_handler+0xc8 libopsec.so`opsec_fwasync_conn_handler+0x2fc libComUtils.so`0xfe393bc4 libComUtils.so`T_event_mainloop_iter+0x4c4 libComUtils.so`T_event_mainloop+0x118 libfw1.so`fwd_cmain+0x131c libfw1.so`fw_cmain+0x910 fwd`_start+0x108 226302 > > Hope this helps, > Phil I hope so too. And thanks for the fast response! regards Matthew > > > Matthew Flanagan wrote: > > Hi, > > > > I'm trying to track down the source of a > performance problem with an application. The > application in question is from a firewall vendor and > the component of it that is suffering is the log > handling daemon. The log daemon is single threaded > and receives logs from firewalls and also services > various consumers which could be either simple log > viewing CLI or GUI clients or log reporting engines > running on other servers. > > > > The performance issue is seen from both the > consumers and producers. Producers complain that the > log server is not responding and therefore unable to > send logs. The consumers don't complain (but the > users do!) but are very slow when reading the logs, > filtering on certain fields etc. > > > > I'm at a bit of a loss on where to go from here. > Below is the analysis I've done so far. I'm looking > for advice on whether this can be solved by throwing > more resources at it or whether it is being limited > by a poorly written application. Please let me know > if you'd like to see more output from dtrace or other > commands. > > > > Note, one thing I have seen is that if I stop the > reporting engine running on a different server then > the number of cross-calls drops significantly. > > > > The log server specs are: > > > > 4 Core T5120 with 16GB RAM > > 2TB SAN storage (mirrored ZFS pool). > > $ uname -a > > SunOS logserver 5.10 Generic_137111-01 sun4v sparc > SUNW,SPARC-Enterprise-T5120 > > > > Here is what I've looked at so far: > > > > 1. vmstat - seems ok to me. > > # vmstat 3 > > kthr memory page disk > faults cpu > swap free re mf pi po fr de sr m0 m1 m2 m1 in > sy cs us sy id > 0 0 0 15422904 700912 133 15 6 0 0 0 0 0 0 0 1 > 1618 15569 1774 3 3 94 > > 0 0 0 15457416 730808 54 26 13 0 0 0 0 0 0 0 > 0 1679 18773 1913 3 3 94 > 0 0 0 15455944 729320 148 8 0 0 0 0 0 0 0 0 > 0 1073 19944 1000 4 3 93 > 0 0 0 15454744 728632 223 3 0 0 0 0 0 0 0 0 > 0 1864 22512 2104 4 3 93 > 0 0 0 15454232 729328 160 4 0 0 0 0 0 0 0 0 > 0 994 18935 997 3 3 94 > 0 0 0 15453400 730080 163 4 0 0 0 0 0 0 0 0 > 0 1777 18536 2088 3 3 94 > 0 0 0 15452920 730040 185 3 0 0 0 0 0 0 0 0 > 0 1950 21576 2198 4 3 93 > 0 0 0 15452416 730456 146 0 0 0 0 0 0 0 0 0 > 0 1033 20987 958 3 3 94 > > > 2. iostat - just looking at the fibre channel HBA > throughput. Not really pushing the SAN to its limits. > > # iostat -xznC 1 | egrep c.$ > > r/s w/s kr/s kw/s wait actv wsvc_t > asvc_t %w %b device > > 0.5 1.0 5.7 14.1 0.0 0.0 0.0 > 15.5 0 1 c1 > 1.8 87.9 187.7 3028.6 0.0 1.5 0.2 16.5 > 0 26 c2 > 1.9 88.1 194.4 3066.4 0.0 1.5 0.2 16.5 > 0 26 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 0.0 521.4 0.0 15009.5 0.0 1.5 0.0 3.0 > 0 78 c2 > 0.0 518.4 0.0 15647.2 0.0 1.5 0.0 2.9 > 0 78 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 5.0 3.0 0.4 0.0 0.0 0.0 0.0 0.0 > 0 0 c2 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 0.0 4.0 0.0 2.0 0.0 0.0 0.0 4.9 > 0 2 c1 > 0.0 387.4 0.0 12853.2 0.0 1.0 0.0 2.6 > 0 59 c2 > 0.0 414.5 0.0 13446.9 0.0 1.0 0.0 2.5 > 0 57 c3 > > Running a `dd` of a 2GB log file shows how fast we > can push the SAN > > which peaks at 222 MBytes/s! > > > > # time dd if=2009-06-28_224348_21.log of=/dev/null > bs=128k > > 16368+1 records in > > 16368+1 records out > > > > real 0m9.298s > > user 0m0.094s > > sys 0m4.138s > > > > while dd is happening: > > # iostat -xznC 1 | egrep c.$ > > r/s w/s kr/s kw/s wait actv wsvc_t > asvc_t %w %b device > > 0.5 1.0 5.7 14.0 0.0 0.0 0.0 > 15.5 0 1 c1 > 1.9 87.9 189.6 3028.4 0.0 1.5 0.2 16.5 > 0 26 c2 > 1.9 88.1 196.2 3066.3 0.0 1.5 0.2 16.4 > 0 26 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 2.1 0.0 263.5 0.0 0.0 0.0 0.0 8.4 > 0 2 c2 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 5.1 0.0 651.9 0.0 0.0 0.1 0.0 18.1 > 0 9 c2 > 5.1 0.0 651.9 0.0 0.0 0.3 0.0 64.2 > 0 33 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 5.0 3.0 0.4 0.0 0.0 0.0 0.0 0.0 > 0 0 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 1.0 0.0 4.9 0.0 0.0 0.0 0.0 0.2 > 0 0 c1 > 386.8 0.0 49513.3 0.0 0.0 20.0 0.0 51.8 > 0 678 c2 > 392.7 0.0 50269.1 0.0 0.0 20.0 0.0 51.0 > 0 667 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 869.9 3.0 110697.5 0.0 0.0 49.6 0.0 > 56.8 0 1389 c2 > 843.6 0.0 107982.0 0.0 0.0 49.6 0.0 58.8 > 0 1356 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 737.5 380.6 94397.2 16839.9 0.0 53.3 0.0 > 47.7 0 1347 c2 > 742.4 362.8 95025.5 15487.8 0.0 53.7 0.0 48.6 > 0 1366 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 884.6 0.0 113231.1 0.0 0.0 53.5 0.0 > 60.5 0 1547 c2 > 894.7 0.0 114516.7 0.0 0.0 53.7 0.0 60.0 > 0 1528 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 824.0 0.0 105467.2 0.0 0.0 50.3 0.0 > 61.1 0 1425 c2 > 831.9 0.0 106478.8 0.0 0.0 50.1 0.0 60.3 > 0 1439 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 837.2 0.0 107162.6 0.0 0.0 51.8 0.0 > 61.9 0 1472 c2 > 838.3 0.0 107299.9 0.0 0.0 51.5 0.0 61.4 > 0 1468 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 844.7 0.0 108124.9 0.0 0.0 51.9 0.0 > 61.4 0 1468 c2 > 842.7 0.0 107865.9 0.0 0.0 51.3 0.0 60.9 > 0 1464 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 866.4 154.9 110896.3 12598.9 0.0 60.8 0.0 > 59.6 0 1551 c2 > 857.4 390.7 109745.6 17811.3 0.0 59.2 0.0 > 47.5 0 1544 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 0.0 51.1 0.0 39.6 0.0 0.9 0.0 18.3 > 0 30 c1 > 830.5 0.0 106298.7 0.0 0.0 55.7 0.0 67.1 > 0 1511 c2 > 842.5 0.0 107839.9 0.0 0.0 54.9 0.0 65.2 > 0 1530 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 0.0 13.0 0.0 8.5 0.0 0.1 0.0 9.5 > 0 8 c1 > 355.1 0.0 45454.7 0.0 0.0 13.3 0.0 37.4 > 0 565 c2 > 318.0 0.0 40705.3 0.0 0.0 13.3 0.0 41.8 > 0 572 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 1.0 0.0 128.3 0.0 0.0 0.0 0.0 6.2 > 0 1 c2 > 5.0 3.0 0.4 0.0 0.0 0.0 0.0 0.0 > 0 0 c3 > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t > %w %b device > 0.0 2.0 0.0 1.0 0.0 0.0 0.0 1.6 > 0 0 c1 > 0.0 482.1 0.0 14731.3 0.0 1.2 0.0 2.4 > 0 64 c2 > 0.0 428.1 0.0 15401.4 0.0 1.1 0.0 2.6 > 0 61 c3 > > I think I can safely eliminate the SAN as the > bottleneck. > > > > 3. mpstat - the column that stands out to me is > xcal. I've seen it peak up to 1.5 million xcalls on a > single CPU! > > > > # mpstat 2 > > > > [snip] > > > > CPU minf mjf xcal intr ithr csw icsw migr smtx > srw syscl usr sys wt idl > 0 0 0 96318 337 150 194 3 50 40 > 7 640 7 12 0 81 > 1 801 0 338 87 0 144 4 33 171 0 > 889 61 31 0 8 > 2 0 0 7 36 0 45 0 16 28 1 > 543 3 3 0 94 > 3 0 0 1 15 1 12 0 6 3 0 > 0 0 0 0 100 > 0 0 50458 305 253 123 9 23 29 4 > 3513 27 13 0 60 > 5 0 0 223 215 176 89 2 18 70 > 0 109 7 1 0 91 > 6 0 0 10920 9 0 17 0 1 1 1 > 38 0 1 0 99 > 7 0 0 195 168 163 10 0 0 38 0 > 0 0 1 0 99 > 6 1 0 172 222 0 465 9 80 84 9 > 2324 22 4 0 73 > 17 0 0 24 83 1 157 5 16 35 > 0 870 15 1 0 84 > 8 0 0 23 52 0 108 0 3 28 0 > 0 0 0 0 100 > 9 0 0 1 35 0 69 0 0 0 0 > 0 0 0 0 100 > 0 9 0 31 6 0 0 5 0 1 0 > 17 100 0 0 0 > 21 0 0 10 26 0 49 0 10 27 0 > 60 0 0 0 100 > 22 0 0 0 2 1 0 0 0 0 0 > 0 0 0 0 100 > 3 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 4 83 0 592427 26 0 42 7 15 58 1 > 1922 32 46 0 23 > 25 1 0 212 50 0 98 1 23 21 > 3 949 5 2 0 93 > 6 0 0 1 8 0 16 0 7 8 0 > 2 0 0 0 100 > 7 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 8 0 0 117 53 0 105 5 17 24 7 > 1923 41 6 0 53 > 29 0 0 3 24 0 47 0 10 9 > 1 497 2 1 0 97 > 30 0 0 0 1 0 1 0 0 0 0 > 0 0 0 0 100 > 1 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 2 1 0 2254 119 0 251 5 57 102 6 > 2284 33 8 0 59 > 33 4 0 334 79 0 163 7 20 37 > 6 2007 13 6 0 81 > 4 0 0 3 5 0 9 0 2 2 0 > 0 0 0 0 100 > 5 0 0 3 9 0 17 0 0 0 0 > 0 0 0 0 100 > 6 0 0 259875 110 0 239 11 33 102 9 > 2727 10 33 0 57 > 37 0 0 4 13 0 24 0 5 3 > 0 282 5 0 0 95 > 38 0 0 3 37 0 76 0 0 0 0 > 0 0 0 0 100 > 9 0 0 0 1 0 1 0 0 0 0 > 0 0 0 0 100 > U minf mjf xcal intr ithr csw icsw migr smtx srw > syscl usr sys wt idl > > 0 7 0 1593 256 154 28 6 7 42 > 0 9074 48 25 0 27 > 1 0 0 2 13 0 23 0 7 0 0 > 30 0 1 0 99 > 2 0 0 0 3 0 4 0 0 0 0 > 80 0 0 0 100 > 3 0 0 0 2 1 0 0 0 0 0 > 0 0 0 0 100 > 0 0 523 429 411 124 4 6 24 0 > 1703 7 5 0 87 > 5 0 0 0 52 0 102 0 0 0 > 0 72 0 0 0 100 > 6 0 0 0 5 0 8 0 0 0 0 > 0 0 0 0 100 > 0 0 1 19 0 36 0 0 0 0 > 0 0 0 0 100 > 6 0 0 125980 44 0 79 7 13 12 0 > 3319 22 18 0 59 > 17 0 0 3 15 0 27 0 1 2 > 0 293 3 1 0 96 > 18 0 0 1 5 0 9 0 0 0 0 > 0 0 0 0 100 > 9 0 0 2 19 0 35 0 0 0 0 > 0 0 0 0 100 > 0 0 0 4 23 0 46 0 6 6 0 > 1254 6 1 0 93 > 21 0 0 0 2 0 2 0 0 0 > 0 0 0 0 0 100 > 22 0 0 0 2 1 0 0 0 0 0 > 0 0 0 0 100 > 3 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 4 3 0 94 49 0 89 4 12 20 0 > 3314 14 7 0 78 > 25 0 0 0 6 0 10 0 2 10 > 0 38 0 0 0 100 > 6 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 7 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 8 0 0 311 32 0 58 3 7 5 0 > 1156 8 5 0 86 > 29 0 0 0 1 0 0 0 0 0 > 0 0 0 0 0 100 > 30 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 1 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 2 1 0 407 53 0 104 4 16 27 1 > 3042 15 7 0 78 > 33 0 0 36 86 0 170 0 2 0 > 0 2 0 0 0 100 > 34 0 0 0 3 0 5 0 0 0 0 > 0 0 0 0 100 > 5 0 0 3 7 0 12 0 0 0 0 > 0 0 0 0 100 > 6 0 0 12 49 0 96 1 8 2 0 > 878 4 1 0 95 > 37 1 0 32 3 0 4 0 1 0 0 > 12 0 0 0 100 > 8 0 0 3 27 0 51 0 0 0 0 > 0 0 0 0 100 > 9 0 0 0 4 0 6 0 0 0 0 > 0 0 0 0 100 > U minf mjf xcal intr ithr csw icsw migr smtx srw > syscl usr sys wt idl > > 0 0 0 204 258 146 92 3 16 3 > 0 2692 7 6 0 87 > 1 0 0 4 26 0 38 2 7 10 0 > 1644 13 2 0 84 > 2 0 0 0 6 0 8 0 1 3 0 > 160 0 1 0 99 > 3 0 0 0 2 1 0 0 0 0 0 > 0 0 0 0 100 > 4 0 0 238 451 406 171 5 13 6 0 > 972 4 3 0 93 > 5 0 0 6 40 4 72 0 1 0 0 > 52 0 0 0 100 > 6 0 0 4 17 0 33 0 1 6 0 > 243 1 1 0 98 > 7 0 0 1 6 0 10 0 0 0 0 > 0 0 0 0 100 > 2 0 187838 42 0 72 12 5 37 0 > 5551 35 33 0 32 > 17 0 0 5 21 0 40 0 4 0 > 0 4 0 0 0 100 > 18 0 0 3 7 0 13 0 0 0 0 > 0 0 0 0 100 > 9 0 0 1 21 0 40 0 0 0 0 > 0 0 0 0 100 > 0 0 0 2 18 0 35 0 7 14 0 > 532 2 0 0 98 > 21 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 0 0 0 2 1 0 0 0 0 0 > 0 0 0 0 100 > 3 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 4 2 0 128 32 0 57 3 9 90 0 > 4981 33 8 0 59 > 25 0 0 1 4 0 7 0 2 0 > 0 39 0 0 0 100 > 26 0 0 0 1 0 2 0 1 0 0 > 0 0 0 0 100 > 7 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 8 0 0 94 26 0 47 1 5 76 0 > 1136 4 6 0 89 > 29 0 0 0 3 0 4 0 1 0 > 0 2 0 0 0 100 > 30 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 1 0 0 0 1 0 0 0 0 0 0 > 0 0 0 0 100 > 2 0 0 561 38 0 66 5 12 16 0 > 3872 23 15 0 62 > 33 0 0 8 78 0 155 2 3 1 > 0 38 0 0 0 100 > 34 0 0 0 2 0 3 0 0 0 0 > 0 0 0 0 100 > 5 0 0 3 9 0 17 0 0 0 0 > 0 0 0 0 100 > 6 0 0 15 60 0 115 6 6 6 0 > 1457 8 2 0 90 > 37 0 0 1 4 0 7 0 0 0 > 0 0 0 0 0 100 > 38 0 0 5 28 0 53 0 0 0 0 > 0 0 0 0 100 > 9 0 0 1 4 0 6 0 0 0 0 > 0 0 0 0 100 > 4. who is making all the cross calls > > # dtrace -n 'sysinfo:::xcalls { @num[pid,execname] > = count(); }' > > dtrace: description 'sysinfo:::xcalls ' matched 4 > probes > > ^C > > > > 7 svc.startd > > > > 402 nscd > > > > 1384 snmpd > > > > 1759 cpd > > > > 1921 cpmad > > > > [...snip...] > > 2164 fwd > > 6 > > 1966 fwd > > 8 > > 1927 fwd > > 1 > > 2055 cpd > > 5 > > 2144 fwd > > 5 > > 826 snmpd > > 02 > > 3 fsflush > > 90 > > 21288 fwm > > 79 > > 21289 fw_loader > > 95 > > 21286 fwm > > 093 > > 21287 dtrace > > 430 > > 0 sched > > 6601 > > 2153 fwd > > 37655 > > > > 5. systems calls by this process. > > # dtrace -n 'syscall:::entry /pid == 2153/ { > @num[probefunc] = count(); }' > > dtrace: description 'syscall:::entry ' matched 229 > probes > > ^C > > > > getpid > > > > pollsys > > 4 > > pread64 > > 7 > > pwrite64 > > 34 > > lwp_sigmask > > 48 > > llseek > > 1010 > > lseek > > 8126 > > read > > 9654 > > write > > 7857 > > > > 6. What is it reading & writing? This first column > is the file descriptor number, the second the number > of times it was seen: > > > > # dtrace -n 'syscall::read:entry /pid == 2153/ { > @[arg0] = count(); }' > > dtrace: description 'syscall::read:entry ' matched > 1 probe > > ^C > > > > 47 1 > > 49 12 > > 83 12 > > 73 17 > > 51 18 > > 50 21 > > 53 24 > > 56 24 > > 57 282 > > 58 381 > > 55 401 > > 78 5742 > > 70 5777 > > 77 12835 > > > > # dtrace -n 'syscall::write:entry /pid == 2153/ { > @[arg0] = count(); }' > > dtrace: description 'syscall::write:entry ' matched > 1 probe > > ^C > > > > 13 4 > > 82 7 > > 78 158 > > 70 160 > > 24 21201 > > 23 24974 > > 22 28747 > > > > # pfiles 2153 > > 2153: > > opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/bin/fwd > -n > > Current rlimit: 65536 file descriptors > > > > [...snip...] > > > > 22: S_IFREG mode:0644 dev:256,65541 ino:96177 > uid:0 gid:0 size:1197740174 > > O_RDWR|O_CREAT|O_TRUNC > > > > var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/fw > .log > > 23: S_IFREG mode:0644 dev:256,65541 ino:96178 > uid:0 gid:0 size:47337100 > > O_RDWR|O_CREAT|O_TRUNC > > > > var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/fw > .logptr > > 24: S_IFREG mode:0644 dev:256,65541 ino:96179 > uid:0 gid:0 size:39200652 > > O_RDWR|O_CREAT|O_TRUNC > > > > var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/fw > .loginitial_ptr > > > > [...snip...] > > > > 70: S_IFSOCK mode:0666 dev:332,0 ino:25771 uid:0 > gid:0 size:0 > > O_RDWR|O_NDELAY > > SOCK_STREAM > > > > O_REUSEADDR,SO_OOBINLINE,SO_SNDBUF(49152),SO_RCVBUF(49 > 640),IP_NEXTHOP(0.0.193.232) > > sockname: AF_INET 1.1.1.42 port: 18184 > > peername: AF_INET 1.1.1.5 port: 55011 > > > > [...snip...] > > > > 77: S_IFREG mode:0644 dev:256,65541 ino:125222 > uid:0 gid:0 size:2145391731 > > O_RDONLY > > > > var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/20 > 09-06-08_215011_153.log > > 78: S_IFSOCK mode:0666 dev:332,0 ino:44055 uid:0 > gid:0 size:0 > > O_RDWR|O_NDELAY > > SOCK_STREAM > > > > O_REUSEADDR,SO_OOBINLINE,SO_SNDBUF(49152),SO_RCVBUF(49 > 640),IP_NEXTHOP(0.0.193.232) > > sockname: AF_INET 1.1.1.42 port: 18184 > > peername: AF_INET 1.1.1.5 port: 55147 > > > > [...snip...] > > > > So the file being written is fw.log which is to be > expected. But the reader is fd 77 > 2009-06-08_215011_153.log closely followed by fd’s 70 > & 78 which are tcp connections to log reporting > server. So an educated guess would say that this is > the file that reporting server is trying to index. > Checking the reporting server confirms this. > > > > regards > > > > Matthew Flanagan > > -- > > http://wadofstuff.blogspot.com > > > > _______________________________________________ > perf-discuss mailing list > perf-discuss@opensolaris.org -- This message posted from opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org