Hi Phil
> Matthew,
> 
> In addition to Roland's comment about patching (which
> is particularly 
> important due to the rate of change in ZFS) ...
> 
> 1) Your dd test doesn't really tell us much. Do you
> know what size 
> writes fwd is doing? I doubt it is 128k. You could
> use truss/dtrace to 
> find out and then use dd again with the same
> blocksize.
> 

truss shows writes like this:

/1:     write(22, "\0 4\001 :919C\v J H90 b".., 96)     = 96
/1:     write(23, " %97 *AF", 4)                        = 4
/1:     write(24, " %97 *AF", 4)                        = 4

and reads like this:
/1:     read(77, "\0 D\003 :919F16 J ,EACF".., 131072)  = 131072
/1:     read(77, "\0 D\003 :919F16 J ,EADC".., 131072)  = 131072
/1:     read(77, "\0 D\003 :919F16 J ,EADC".., 131072)  = 131072
/1:     read(78, 0x033BD918, 5)                         Err#11 EAGAIN
/1:     read(78, 0x033BD918, 5)                         Err#11 EAGAIN
/1:     read(78, 0x033BD918, 5)                         Err#11 EAGAIN
/1:     read(78, 0x033BD918, 5)                         Err#11 EAGAIN

I had done a truss before and seen the 128k reads. Though I hadn't noticed so 
many EAGAIN's.

> 2) You've said the application is single threaded.
> This isn't a good fit 
> for the T-series (which was designed for throughput
> workloads at the 
> expense of single thread performance ... e.g. the
> M-series has typically 
> 4x the single thread performance the T-series).
> 

The server has handled the workload quite well until recently when we've added 
a few new firewalls that do quite a large volume of logging. If I can build a 
case for a faster server then the easier it is to convince the business to buy 
it.

> 3) It would be helpful to see 'prstat -Lm' to get a
> better overall feel 
> for what is running where (especially, where fwd is
> spending its time).
> 

   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
  2153 root      41  59 0.1 0.0 0.0 0.0 0.3 0.2 383  11  7K   0 fwd/1
  1923 root      16 5.4 0.1 0.0 0.0 0.0  78 0.1  52  18  2K   0 fwd/1
  2144 root      12 1.3 0.0 0.0 0.0 0.0  87 0.0  10   4 503   0 fwd/1
  1927 root     5.4 4.2 0.0 0.0 0.0 0.0  90 0.0  51   1  1K   0 fwd/1
  1941 root     2.5 4.3 0.0 0.0 0.0 0.0  93 0.0  29   0 855   0 fwd/1
  1975 root     6.4 0.4 0.0 0.0 0.0 0.0  93 0.0   1   2 936   0 cpmad/1
  2166 root     5.5 0.1 0.0 0.0 0.0 0.0  94 0.0   5   1  36   0 fwm/1
 22500 root     4.8 0.6 0.0 0.0 0.0 0.0  95 0.0   1   2  1K   0 cpmad/1
 23745 root     0.7 4.6 0.0 0.0 0.0 0.0  95 0.0  42   1 603   0 prstat/1
  1966 root     2.8 2.1 0.0 0.0 0.0 0.0  95 0.0  12   1 996   0 fwd/1
  2250 root     3.6 0.1 0.0 0.0 0.0 0.0  96 0.0   4   0  36   0 fwm/1
  2175 root     1.7 0.9 0.0 0.0 0.0 0.0  97 0.0  19   0 404   0 fwd/1
 21873 root     2.3 0.1 0.0 0.0 0.0 0.0  98 0.0   1   0 396   0 cpmad/1
 23523 root     1.8 0.2 0.0 0.0 0.0 0.0  98 0.0   1   1 423   0 cpmad/1
  2164 root     1.2 0.5 0.0 0.0 0.0 0.0  98 0.0   5   0 320   0 fwd/1
  1902 root     0.4 0.8 0.0 0.0 0.0 0.0  99 0.0   1   0 159   0 fwd/1
   402 root     0.2 0.1 0.0 0.0 0.0 0.0 100 0.0  39   0 234   0 nscd/25
  1901 root     0.2 0.1 0.0 0.0 0.0 0.0 100 0.0   7   0 201   0 fwd/1
  1957 root     0.1 0.1 0.0 0.0 0.0 0.0 100 0.0   1   0  21   0 fwd/1
 23657 mflanaga 0.1 0.0 0.0 0.0 0.0 0.0 100 0.0   3   2  24   0 sshd/1
  1747 root     0.1 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0  20   0 cpd/1
  1934 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   3   0  90   0 fwd/1
  2137 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0  18   0 fwd/1
  2157 root     0.1 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   8   0 fwm/1
  2145 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   9   0 fwd/1
  1921 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   5   0 cpmad/1
  2055 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0  12   0 cpd/1
  1960 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   7   0 fwm/1
  1944 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   7   0 fwm/1
  1917 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   8   0 fwd/1
   522 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0  10   2 xntpd/1
  1968 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   7   0 fwm/1
  2147 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0   4   0 fwm/1
  2056 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   3   0   5   0 cpd/2
  1757 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   3   0   5   0 cpd/2
  1747 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   3   0   5   0 cpd/2
  1759 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   3   0   5   0 cpd/2
  1745 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   3   0   5   0 cpd/2
  2039 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   3   0   5   0 cpd/2
Total: 115 processes, 412 lwps, load averages: 2.07, 2.05, 2.05

> 4) It's cool to see you using DTrace. You've shown
> again that one-liners 
> can tell us so much, and so easily. My only comment
> is that it would be 
> useful to know how long your tests ran before you hit
> ^C.
> 

I usually ran each dtrace for 20 seconds.

> 5) How about some DTrace to find out where in the
> kernel the crosscalls 
> are happening? For example ...
> 
> dtrace -n 'sysinfo:::xcalls { @[stack()] =
>  count();  }
> ... although bringing the system up-to-date with
> patches may resolve 
> this issue anyway.

# dtrace -n 'sysinfo:::xcalls { @[stack()] = count(); }'
[...snip...]
              unix`send_one_mondo+0x1c
              unix`xt_one_unchecked+0xc8
              genunix`sleepq_wakeone_chan+0x70
              genunix`cv_signal+0x58
              genunix`taskq_dispatch+0x94
              genunix`callout_schedule+0x48
              genunix`clock+0x6a0
              genunix`cyclic_softint+0xa4
              unix`cbe_level10+0x8
              unix`intr_thread+0x168
             2068

              unix`send_mondo_set+0x314
              unix`xt_some+0x1a0
              unix`sfmmu_tlb_range_demap+0x1f0
              unix`hat_unload_callback+0x850
              unix`segkmem_free_vn+0x58
              genunix`kmem_magazine_destroy+0xd8
              genunix`kmem_depot_ws_reap+0x50
              genunix`taskq_thread+0x1a4
              unix`thread_start+0x4
             2181

              unix`send_mondo_set+0x314
              unix`xt_some+0x1a0
              unix`xt_sync+0x158
              unix`hat_unload_callback+0x898
              unix`segkmem_free_vn+0x58
              genunix`kmem_magazine_destroy+0xd8
              genunix`kmem_depot_ws_reap+0x50
              genunix`taskq_thread+0x1a4
              unix`thread_start+0x4
             2181

              unix`send_one_mondo+0x1c
              unix`xt_one_unchecked+0xc8
              genunix`sleepq_wakeone_chan+0x70
              genunix`cv_signal+0x58
              dls`soft_ring_process+0x214
              dls`dls_soft_ring_fanout+0x2bc
              dls`i_dls_link_rx+0x1f0
              mac`mac_rx+0x8c
              e1000g`e1000g_intr_pciexpress+0xbc
              px`px_msiq_intr+0x1e8
              unix`intr_thread+0x168
             2841

              unix`send_one_mondo+0x1c
              unix`xt_one_unchecked+0xc8
              genunix`sleepq_wakeone_chan+0x70
              genunix`cv_signal+0x58
              genunix`taskq_dispatch+0x94
              zfs`zio_done+0x190
              zfs`zio_vdev_io_assess+0x178
              genunix`taskq_thread+0x1a4
              unix`thread_start+0x4
             3498

              unix`send_one_mondo+0x1c
              unix`xt_one_unchecked+0xc8
              genunix`sleepq_wakeone_chan+0x70
              genunix`cv_signal+0x58
              genunix`taskq_dispatch+0x94
              emcp`PowerPlatformTopIodone+0x60
              emcp`PowerTopIodone+0xc
              emcp`PowerProcessTopIodonePirps+0x2c
              emcp`PowerBottomIodoneNew+0x308
              emcp`PowerPlatformBottomIodone+0x2c
              ssd`ssd_return_command+0x240
              ssd`ssdintr+0x268
              qlc`ql_fast_fcp_post+0x178
              qlc`ql_24xx_status_entry+0x2ec
              qlc`ql_response_pkt+0x268
              qlc`ql_isr_aif+0x7dc
              px`px_msiq_intr+0x1e8
              unix`intr_thread+0x168
             5017

              unix`send_mondo_set+0x314
              unix`xt_some+0x1a0
              unix`sfmmu_tlb_range_demap+0x1f0
              unix`hat_unload_callback+0x850
              unix`ppmapout+0x94
              zfs`mappedread+0xa4
              zfs`zfs_read+0x19c
              genunix`fop_read+0x20
              genunix`read+0x274
              unix`syscall_trap32+0xcc
           135480

              unix`send_mondo_set+0x314
              unix`xt_some+0x1a0
              unix`xt_sync+0x158
              unix`hat_unload_callback+0x898
              unix`ppmapout+0x94
              zfs`mappedread+0xa4
              zfs`zfs_read+0x19c
              genunix`fop_read+0x20
              genunix`read+0x274
              unix`syscall_trap32+0xcc
           135480

I've also done the same dtrace but with ustack() just for the process.

# dtrace -n 'sysinfo:::xcalls /pid == 2153/ { @[ustack()] = count(); }'

[snip]
              libc_psr.so.1`memcpy+0x18c
              
libCPLogRepository.so`__1cICLogFilePReadLogByOffset6MpnJILogEvent_LpL3_i_+0x118
              
libCPLogRepository.so`__1cICLogFileHReadLog6MpnJILogEvent_LpL33_i_+0xd4
              
libCPLogRepository.so`__1cICLogFileIReadNext6MpnJILogEvent_pL33_i_+0x40
              
libCPLogLuFile.so`__1cKCLuLogFileLgatherChain6Mn0APtReadNavigation_kLrnKCSrvVector4CpnJILogEvent_nDstdTallocator_interface4n0DJallocator4C4__C4____r4_i_+0x164
              
libCPLogLuFile.so`__1cKCLuLogFileJdoReadLog6MpnJILogEvent_n0APtReadNavigation_kLipp2pi_i_+0x80
              libCPLogLuFile.so`0xfc493e9c
              
libfw1.so`__1cNlog_read_next6FpnPICplogLuLogFile_pnJILogEvent_bpp3pi6_3_+0xfc
              libfwdopsec.so`lea_send_line+0x108
              libopsec.so`lea_send_50_records+0x68
              libopsec.so`lea_server_event_handler+0xc8
              libopsec.so`opsec_fwasync_conn_handler+0x2fc
              libComUtils.so`0xfe393bc4
              libComUtils.so`T_event_mainloop_iter+0x4c4
              libComUtils.so`T_event_mainloop+0x118
              libfw1.so`fwd_cmain+0x131c
              libfw1.so`fw_cmain+0x910
              fwd`_start+0x108
               18

              libc_psr.so.1`memcpy+0x18c
              
libCPLogRepository.so`__1cOCBinaryLogFileHReadLog6MpnJILogEvent_pnNCLogFileChain__b_+0xc
              
libCPLogRepository.so`__1cICLogFilePReadLogByOffset6MpnJILogEvent_LpL3_i_+0x118
              
libCPLogRepository.so`__1cICLogFileHReadLog6MpnJILogEvent_LpL33_i_+0xd4
              
libCPLogRepository.so`__1cICLogFileIReadNext6MpnJILogEvent_pL33_i_+0x40
              
libCPLogLuFile.so`__1cKCLuLogFileLgatherChain6Mn0APtReadNavigation_kLrnKCSrvVector4CpnJILogEvent_nDstdTallocator_interface4n0DJallocator4C4__C4____r4_i_+0x164
              
libCPLogLuFile.so`__1cKCLuLogFileJdoReadLog6MpnJILogEvent_n0APtReadNavigation_kLipp2pi_i_+0x80
              0x3
               32

              libc_psr.so.1`memcpy+0x18c
              
libCPLogRepository.so`__1cOCBinaryLogFileHReadLog6MpnJILogEvent_pnNCLogFileChain__b_+0xc
              
libCPLogRepository.so`__1cICLogFilePReadLogByOffset6MpnJILogEvent_LpL3_i_+0x118
              
libCPLogLuFile.so`__1cKCLuLogFileLgatherChain6Mn0APtReadNavigation_kLrnKCSrvVector4CpnJILogEvent_nDstdTallocator_interface4n0DJallocator4C4__C4____r4_i_+0x654
              
libCPLogLuFile.so`__1cKCLuLogFileJdoReadLog6MpnJILogEvent_n0APtReadNavigation_kLipp2pi_i_+0x80
              libCPLogLuFile.so`0xfc493e9c
              
libfw1.so`__1cNlog_read_next6FpnPICplogLuLogFile_pnJILogEvent_bpp3pi6_3_+0xfc
              libfwdopsec.so`lea_send_line+0x108
              libopsec.so`lea_send_50_records+0x68
              libopsec.so`lea_server_event_handler+0xc8
              libopsec.so`opsec_fwasync_conn_handler+0x2fc
              libComUtils.so`0xfe393bc4
              libComUtils.so`T_event_mainloop_iter+0x4c4
              libComUtils.so`T_event_mainloop+0x118
              libfw1.so`fwd_cmain+0x131c
              libfw1.so`fw_cmain+0x910
              fwd`_start+0x108
               62

              libc.so.1`_read+0x8
              libc.so.1`fread+0x298
              
libCPLogRepository.so`__1cOCBinaryLogFileNReadLogHeader6kMpnJILogEvent_pnNCLogFileChain__b_+0x280
              
libCPLogRepository.so`__1cOCBinaryLogFileHReadLog6MpnJILogEvent_pnNCLogFileChain__b_+0xc
              
libCPLogRepository.so`__1cICLogFilePReadLogByOffset6MpnJILogEvent_LpL3_i_+0x118
              
libCPLogLuFile.so`__1cKCLuLogFileLgatherChain6Mn0APtReadNavigation_kLrnKCSrvVector4CpnJILogEvent_nDstdTallocator_interface4n0DJallocator4C4__C4____r4_i_+0x3f8
              
libCPLogLuFile.so`__1cKCLuLogFileJdoReadLog6MpnJILogEvent_n0APtReadNavigation_kLipp2pi_i_+0x80
              libCPLogLuFile.so`0xfc493e9c
              
libfw1.so`__1cNlog_read_next6FpnPICplogLuLogFile_pnJILogEvent_bpp3pi6_3_+0xfc
              libfwdopsec.so`lea_send_line+0x108
              libopsec.so`lea_send_50_records+0x68
              libopsec.so`lea_server_event_handler+0xc8
              libopsec.so`opsec_fwasync_conn_handler+0x2fc
              libComUtils.so`0xfe393bc4
              libComUtils.so`T_event_mainloop_iter+0x4c4
              libComUtils.so`T_event_mainloop+0x118
              libfw1.so`fwd_cmain+0x131c
              libfw1.so`fw_cmain+0x910
              fwd`_start+0x108
           226302

> 
> Hope this helps,
> Phil

I hope so too. And thanks for the fast response!

regards

Matthew

> 
> 
> Matthew Flanagan wrote:
> > Hi,
> >
> > I'm trying to track down the source of a
> performance problem with an application. The
> application in question is from a firewall vendor and
> the component of it that is suffering is the log
> handling daemon. The log daemon is single threaded
> and receives logs from firewalls and also services
> various consumers which could be either simple log
> viewing CLI or GUI clients or log reporting engines
> running on other servers.
> >
> > The performance issue is seen from both the
> consumers and producers. Producers complain that the
> log server is not responding and therefore unable to
> send logs. The consumers don't complain (but the
> users do!) but are very slow when reading the logs,
> filtering on certain fields etc.
> >
> > I'm at a bit of a loss on where to go from here.
> Below is the analysis I've done so far. I'm looking
> for advice on whether this can be solved by throwing
> more resources at it or whether it is being limited
> by a poorly written application. Please let me know
> if you'd like to see more output from dtrace or other
> commands.
> >
> > Note, one thing I have seen is that if I stop the
> reporting engine running on a different server then
> the number of cross-calls drops significantly.
> >
> > The log server specs are:
> >
> > 4 Core T5120 with 16GB RAM
> > 2TB SAN storage (mirrored ZFS pool).
> > $ uname -a
> > SunOS logserver 5.10 Generic_137111-01 sun4v sparc
> SUNW,SPARC-Enterprise-T5120
> >
> > Here is what I've looked at so far:
> >
> > 1. vmstat - seems ok to me.
> > # vmstat 3
> >  kthr      memory            page            disk
>          faults      cpu
> swap  free  re  mf pi po fr de sr m0 m1 m2 m1   in
>    sy   cs us sy id
> 0 0 0 15422904 700912 133 15 6 0  0  0  0  0  0  0  1
> 1618 15569 1774 3  3 94
> >  0 0 0 15457416 730808 54 26 13 0  0  0  0  0  0  0
>  0 1679 18773 1913 3  3 94
> 0 0 0 15455944 729320 148 8 0  0  0  0  0  0  0  0
>  0 1073 19944 1000 4  3 93
> 0 0 0 15454744 728632 223 3 0  0  0  0  0  0  0  0
>  0 1864 22512 2104 4  3 93
> 0 0 0 15454232 729328 160 4 0  0  0  0  0  0  0  0
>  0  994 18935 997  3  3 94
> 0 0 0 15453400 730080 163 4 0  0  0  0  0  0  0  0
>  0 1777 18536 2088 3  3 94
> 0 0 0 15452920 730040 185 3 0  0  0  0  0  0  0  0
>  0 1950 21576 2198 4  3 93
> 0 0 0 15452416 730456 146 0 0  0  0  0  0  0  0  0
>  0 1033 20987 958  3  3 94
> 
> > 2. iostat - just looking at the fibre channel HBA
> throughput. Not really pushing the SAN to its limits.
> > # iostat -xznC 1 | egrep c.$
> >     r/s    w/s   kr/s   kw/s wait actv wsvc_t
> asvc_t  %w  %b device
> >     0.5    1.0    5.7   14.1  0.0  0.0    0.0
>   15.5   0   1 c1
> 1.8   87.9  187.7 3028.6  0.0  1.5    0.2   16.5
>    0  26 c2
> 1.9   88.1  194.4 3066.4  0.0  1.5    0.2   16.5
>    0  26 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 0.0  521.4    0.0 15009.5  0.0  1.5    0.0    3.0
>   0  78 c2
> 0.0  518.4    0.0 15647.2  0.0  1.5    0.0    2.9
>   0  78 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 5.0    3.0    0.4    0.0  0.0  0.0    0.0    0.0
>   0   0 c2
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 0.0    4.0    0.0    2.0  0.0  0.0    0.0    4.9
>   0   2 c1
> 0.0  387.4    0.0 12853.2  0.0  1.0    0.0    2.6
>   0  59 c2
> 0.0  414.5    0.0 13446.9  0.0  1.0    0.0    2.5
>   0  57 c3
> > Running a `dd` of a 2GB log file shows how fast we
> can push the SAN
> > which peaks at 222 MBytes/s!
> >
> > # time dd if=2009-06-28_224348_21.log of=/dev/null
> bs=128k
> > 16368+1 records in
> > 16368+1 records out
> >
> > real    0m9.298s
> > user    0m0.094s
> > sys     0m4.138s
> >
> > while dd is happening:
> > # iostat -xznC 1 | egrep c.$
> >     r/s    w/s   kr/s   kw/s wait actv wsvc_t
> asvc_t  %w  %b device
> >     0.5    1.0    5.7   14.0  0.0  0.0    0.0
>   15.5   0   1 c1
> 1.9   87.9  189.6 3028.4  0.0  1.5    0.2   16.5
>    0  26 c2
> 1.9   88.1  196.2 3066.3  0.0  1.5    0.2   16.4
>    0  26 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 2.1    0.0  263.5    0.0  0.0  0.0    0.0    8.4
>   0   2 c2
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 5.1    0.0  651.9    0.0  0.0  0.1    0.0   18.1
>    0   9 c2
> 5.1    0.0  651.9    0.0  0.0  0.3    0.0   64.2
>    0  33 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 5.0    3.0    0.4    0.0  0.0  0.0    0.0    0.0
>   0   0 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 1.0    0.0    4.9    0.0  0.0  0.0    0.0    0.2
>   0   0 c1
> 386.8    0.0 49513.3    0.0  0.0 20.0    0.0   51.8
>    0 678 c2
> 392.7    0.0 50269.1    0.0  0.0 20.0    0.0   51.0
>    0 667 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 869.9    3.0 110697.5    0.0  0.0 49.6    0.0
>    56.8   0 1389 c2
> 843.6    0.0 107982.0    0.0  0.0 49.6    0.0   58.8
>    0 1356 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 737.5  380.6 94397.2 16839.9  0.0 53.3    0.0
>    47.7   0 1347 c2
> 742.4  362.8 95025.5 15487.8  0.0 53.7    0.0   48.6
>    0 1366 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 884.6    0.0 113231.1    0.0  0.0 53.5    0.0
>    60.5   0 1547 c2
> 894.7    0.0 114516.7    0.0  0.0 53.7    0.0   60.0
>    0 1528 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 824.0    0.0 105467.2    0.0  0.0 50.3    0.0
>    61.1   0 1425 c2
> 831.9    0.0 106478.8    0.0  0.0 50.1    0.0   60.3
>    0 1439 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 837.2    0.0 107162.6    0.0  0.0 51.8    0.0
>    61.9   0 1472 c2
> 838.3    0.0 107299.9    0.0  0.0 51.5    0.0   61.4
>    0 1468 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 844.7    0.0 108124.9    0.0  0.0 51.9    0.0
>    61.4   0 1468 c2
> 842.7    0.0 107865.9    0.0  0.0 51.3    0.0   60.9
>    0 1464 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 866.4  154.9 110896.3 12598.9  0.0 60.8    0.0
>    59.6   0 1551 c2
> 857.4  390.7 109745.6 17811.3  0.0 59.2    0.0
>    47.5   0 1544 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 0.0   51.1    0.0   39.6  0.0  0.9    0.0   18.3
>    0  30 c1
> 830.5    0.0 106298.7    0.0  0.0 55.7    0.0   67.1
>    0 1511 c2
> 842.5    0.0 107839.9    0.0  0.0 54.9    0.0   65.2
>    0 1530 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 0.0   13.0    0.0    8.5  0.0  0.1    0.0    9.5
>   0   8 c1
> 355.1    0.0 45454.7    0.0  0.0 13.3    0.0   37.4
>    0 565 c2
> 318.0    0.0 40705.3    0.0  0.0 13.3    0.0   41.8
>    0 572 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 1.0    0.0  128.3    0.0  0.0  0.0    0.0    6.2
>   0   1 c2
> 5.0    3.0    0.4    0.0  0.0  0.0    0.0    0.0
>   0   0 c3
> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t
>   %w  %b device
> 0.0    2.0    0.0    1.0  0.0  0.0    0.0    1.6
>   0   0 c1
> 0.0  482.1    0.0 14731.3  0.0  1.2    0.0    2.4
>   0  64 c2
> 0.0  428.1    0.0 15401.4  0.0  1.1    0.0    2.6
>   0  61 c3
> > I think I can safely eliminate the SAN as the
> bottleneck.
> >
> > 3. mpstat - the column that stands out to me is
> xcal. I've seen it peak up to 1.5 million xcalls on a
> single CPU!
> >
> > # mpstat 2
> >
> > [snip]
> >
> > CPU minf mjf xcal  intr ithr  csw icsw migr smtx
>  srw syscl  usr sys  wt idl
> 0    0   0 96318   337  150  194    3   50   40
>    7   640    7  12   0  81
> 1  801   0  338    87    0  144    4   33  171    0
>    889   61  31   0   8
> 2    0   0    7    36    0   45    0   16   28    1
>    543    3   3   0  94
> 3    0   0    1    15    1   12    0    6    3    0
>      0    0   0   0 100
> 0   0 50458   305  253  123    9   23   29    4
>  3513   27  13   0  60
> 5    0   0  223   215  176   89    2   18   70
>    0   109    7   1   0  91
> 6    0   0 10920     9    0   17    0    1    1    1
>     38    0   1   0  99
> 7    0   0  195   168  163   10    0    0   38    0
>     0    0   1   0  99
> 6    1   0  172   222    0  465    9   80   84    9
>  2324   22   4   0  73
> 17    0   0   24    83    1  157    5   16   35
>     0   870   15   1   0  84
> 8    0   0   23    52    0  108    0    3   28    0
>     0    0   0   0 100
> 9    0   0    1    35    0   69    0    0    0    0
>     0    0   0   0 100
> 0    9   0   31     6    0    0    5    0    1    0
>    17  100   0   0   0
> 21    0   0   10    26    0   49    0   10   27    0
>    60    0   0   0 100
> 22    0   0    0     2    1    0    0    0    0    0
>     0    0   0   0 100
> 3    0   0    0     1    0    0    0    0    0    0
>     0    0   0   0 100
> 4   83   0 592427    26    0   42    7   15   58    1
>  1922   32  46   0  23
> 25    1   0  212    50    0   98    1   23   21
>     3   949    5   2   0  93
> 6    0   0    1     8    0   16    0    7    8    0
>     2    0   0   0 100
> 7    0   0    0     1    0    0    0    0    0    0
>     0    0   0   0 100
> 8    0   0  117    53    0  105    5   17   24    7
>  1923   41   6   0  53
> 29    0   0    3    24    0   47    0   10    9
>    1   497    2   1   0  97
> 30    0   0    0     1    0    1    0    0    0    0
>     0    0   0   0 100
> 1    0   0    0     1    0    0    0    0    0    0
>     0    0   0   0 100
> 2    1   0 2254   119    0  251    5   57  102    6
>  2284   33   8   0  59
> 33    4   0  334    79    0  163    7   20   37
>     6  2007   13   6   0  81
> 4    0   0    3     5    0    9    0    2    2    0
>     0    0   0   0 100
> 5    0   0    3     9    0   17    0    0    0    0
>     0    0   0   0 100
> 6    0   0 259875   110    0  239   11   33  102    9
>  2727   10  33   0  57
> 37    0   0    4    13    0   24    0    5    3
>    0   282    5   0   0  95
> 38    0   0    3    37    0   76    0    0    0    0
>     0    0   0   0 100
> 9    0   0    0     1    0    1    0    0    0    0
>     0    0   0   0 100
> U minf mjf xcal  intr ithr  csw icsw migr smtx  srw
> syscl  usr sys  wt idl
> >   0    7   0 1593   256  154   28    6    7   42
>    0  9074   48  25   0  27
> 1    0   0    2    13    0   23    0    7    0    0
>     30    0   1   0  99
> 2    0   0    0     3    0    4    0    0    0    0
>    80    0   0   0 100
> 3    0   0    0     2    1    0    0    0    0    0
>      0    0   0   0 100
> 0   0  523   429  411  124    4    6   24    0
>  1703    7   5   0  87
> 5    0   0    0    52    0  102    0    0    0
>    0    72    0   0   0 100
> 6    0   0    0     5    0    8    0    0    0    0
>      0    0   0   0 100
> 0   0    1    19    0   36    0    0    0    0
>     0    0   0   0 100
> 6    0   0 125980    44    0   79    7   13   12    0
>  3319   22  18   0  59
> 17    0   0    3    15    0   27    0    1    2
>    0   293    3   1   0  96
> 18    0   0    1     5    0    9    0    0    0    0
>     0    0   0   0 100
> 9    0   0    2    19    0   35    0    0    0    0
>     0    0   0   0 100
> 0    0   0    4    23    0   46    0    6    6    0
>  1254    6   1   0  93
> 21    0   0    0     2    0    2    0    0    0
>    0     0    0   0   0 100
> 22    0   0    0     2    1    0    0    0    0    0
>     0    0   0   0 100
> 3    0   0    0     1    0    0    0    0    0    0
>     0    0   0   0 100
> 4    3   0   94    49    0   89    4   12   20    0
>  3314   14   7   0  78
> 25    0   0    0     6    0   10    0    2   10
>     0    38    0   0   0 100
> 6    0   0    0     1    0    0    0    0    0    0
>     0    0   0   0 100
> 7    0   0    0     1    0    0    0    0    0    0
>     0    0   0   0 100
> 8    0   0  311    32    0   58    3    7    5    0
>  1156    8   5   0  86
> 29    0   0    0     1    0    0    0    0    0
>    0     0    0   0   0 100
> 30    0   0    0     1    0    0    0    0    0    0
>     0    0   0   0 100
> 1    0   0    0     1    0    0    0    0    0    0
>     0    0   0   0 100
> 2    1   0  407    53    0  104    4   16   27    1
>  3042   15   7   0  78
> 33    0   0   36    86    0  170    0    2    0
>    0     2    0   0   0 100
> 34    0   0    0     3    0    5    0    0    0    0
>     0    0   0   0 100
> 5    0   0    3     7    0   12    0    0    0    0
>     0    0   0   0 100
> 6    0   0   12    49    0   96    1    8    2    0
>   878    4   1   0  95
> 37    1   0   32     3    0    4    0    1    0    0
>     12    0   0   0 100
> 8    0   0    3    27    0   51    0    0    0    0
>     0    0   0   0 100
> 9    0   0    0     4    0    6    0    0    0    0
>     0    0   0   0 100
> U minf mjf xcal  intr ithr  csw icsw migr smtx  srw
> syscl  usr sys  wt idl
> >   0    0   0  204   258  146   92    3   16    3
>    0  2692    7   6   0  87
> 1    0   0    4    26    0   38    2    7   10    0
>   1644   13   2   0  84
> 2    0   0    0     6    0    8    0    1    3    0
>   160    0   1   0  99
> 3    0   0    0     2    1    0    0    0    0    0
>     0    0   0   0 100
> 4    0   0  238   451  406  171    5   13    6    0
>   972    4   3   0  93
> 5    0   0    6    40    4   72    0    1    0    0
>    52    0   0   0 100
> 6    0   0    4    17    0   33    0    1    6    0
>    243    1   1   0  98
> 7    0   0    1     6    0   10    0    0    0    0
>      0    0   0   0 100
> 2   0 187838    42    0   72   12    5   37    0
>  5551   35  33   0  32
> 17    0   0    5    21    0   40    0    4    0
>    0     4    0   0   0 100
> 18    0   0    3     7    0   13    0    0    0    0
>     0    0   0   0 100
> 9    0   0    1    21    0   40    0    0    0    0
>     0    0   0   0 100
> 0    0   0    2    18    0   35    0    7   14    0
>   532    2   0   0  98
> 21    0   0    0     1    0    0    0    0    0    0
>      0    0   0   0 100
> 0   0    0     2    1    0    0    0    0    0
>     0    0   0   0 100
> 3    0   0    0     1    0    0    0    0    0    0
>     0    0   0   0 100
> 4    2   0  128    32    0   57    3    9   90    0
>  4981   33   8   0  59
> 25    0   0    1     4    0    7    0    2    0
>    0    39    0   0   0 100
> 26    0   0    0     1    0    2    0    1    0    0
>     0    0   0   0 100
> 7    0   0    0     1    0    0    0    0    0    0
>     0    0   0   0 100
> 8    0   0   94    26    0   47    1    5   76    0
>  1136    4   6   0  89
> 29    0   0    0     3    0    4    0    1    0
>    0     2    0   0   0 100
> 30    0   0    0     1    0    0    0    0    0    0
>     0    0   0   0 100
> 1    0   0    0     1    0    0    0    0    0    0
>     0    0   0   0 100
> 2    0   0  561    38    0   66    5   12   16    0
>  3872   23  15   0  62
> 33    0   0    8    78    0  155    2    3    1
>    0    38    0   0   0 100
> 34    0   0    0     2    0    3    0    0    0    0
>     0    0   0   0 100
> 5    0   0    3     9    0   17    0    0    0    0
>     0    0   0   0 100
> 6    0   0   15    60    0  115    6    6    6    0
>  1457    8   2   0  90
> 37    0   0    1     4    0    7    0    0    0
>    0     0    0   0   0 100
> 38    0   0    5    28    0   53    0    0    0    0
>     0    0   0   0 100
> 9    0   0    1     4    0    6    0    0    0    0
>     0    0   0   0 100
> 4. who is making all the cross calls
> > # dtrace -n 'sysinfo:::xcalls { @num[pid,execname]
> = count(); }'
> > dtrace: description 'sysinfo:::xcalls ' matched 4
> probes
> > ^C
> >
> >         7  svc.startd
> 
> 
> >       402  nscd
> 
> 
> >      1384  snmpd
> 
> 
> >      1759  cpd
> 
> 
> >      1921  cpmad
> 
> 
> > [...snip...]
> >      2164  fwd
> 
> 6
> >      1966  fwd
> 
> 8
> >      1927  fwd
> 
> 1
> >      2055  cpd
> 
> 5
> >      2144  fwd
> 
> 5
> >       826  snmpd
> 
> 02
> >         3  fsflush
> 
> 90
> >     21288  fwm
> 
> 79
> >     21289  fw_loader
> 
> 95
> >     21286  fwm
> 
> 093
> >     21287  dtrace
> 
> 430
> >         0  sched
> 
> 6601
> >      2153  fwd
> 
> 37655
> >
> > 5. systems calls by this process.
> > # dtrace -n 'syscall:::entry /pid == 2153/ {
> @num[probefunc] = count(); }'
> > dtrace: description 'syscall:::entry ' matched 229
> probes
> > ^C
> >
> >   getpid
> 
> 
> >   pollsys
> 
> 4
> >   pread64
> 
> 7
> >   pwrite64
> 
> 34
> >   lwp_sigmask
> 
> 48
> >   llseek
> 
> 1010
> >   lseek
> 
> 8126
> >   read
> 
> 9654
> >   write
> 
> 7857
> >
> > 6. What is it reading & writing? This first column
> is the file descriptor number, the second the number
> of times it was seen:
> >
> > # dtrace -n 'syscall::read:entry /pid == 2153/ {
> @[arg0] = count(); }'
> > dtrace: description 'syscall::read:entry ' matched
> 1 probe
> > ^C
> >
> >                47                1
> >                49               12
> >                83               12
> >                73               17
> >                51               18
> >                50               21
> >                53               24
> >                56               24
> >                57              282
> >                58              381
> >                55              401
> >                78             5742
> >                70             5777
> >                77            12835
> >
> > # dtrace -n 'syscall::write:entry /pid == 2153/ {
> @[arg0] = count(); }'
> > dtrace: description 'syscall::write:entry ' matched
> 1 probe
> > ^C
> >
> >                13                4
> >                82                7
> >                78              158
> >                70              160
> >                24            21201
> >                23            24974
> >                22            28747
> >
> > # pfiles 2153
> > 2153:
> 
> opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/bin/fwd
> -n
> >   Current rlimit: 65536 file descriptors
> >
> > [...snip...]
> >
> >   22: S_IFREG mode:0644 dev:256,65541 ino:96177
> uid:0 gid:0 size:1197740174
> >       O_RDWR|O_CREAT|O_TRUNC
> >
> 
> var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/fw
> .log
> >   23: S_IFREG mode:0644 dev:256,65541 ino:96178
> uid:0 gid:0 size:47337100
> >       O_RDWR|O_CREAT|O_TRUNC
> >
> 
> var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/fw
> .logptr
> >   24: S_IFREG mode:0644 dev:256,65541 ino:96179
> uid:0 gid:0 size:39200652
> >       O_RDWR|O_CREAT|O_TRUNC
> >
> 
> var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/fw
> .loginitial_ptr
> >
> > [...snip...]
> >
> >   70: S_IFSOCK mode:0666 dev:332,0 ino:25771 uid:0
> gid:0 size:0
> >       O_RDWR|O_NDELAY
> >         SOCK_STREAM
> >
> 
> O_REUSEADDR,SO_OOBINLINE,SO_SNDBUF(49152),SO_RCVBUF(49
> 640),IP_NEXTHOP(0.0.193.232)
> >         sockname: AF_INET 1.1.1.42  port: 18184
> >         peername: AF_INET 1.1.1.5  port: 55011
> >
> > [...snip...]
> >
> >   77: S_IFREG mode:0644 dev:256,65541 ino:125222
> uid:0 gid:0 size:2145391731
> >       O_RDONLY
> >
> 
> var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/20
> 09-06-08_215011_153.log
> >   78: S_IFSOCK mode:0666 dev:332,0 ino:44055 uid:0
> gid:0 size:0
> >       O_RDWR|O_NDELAY
> >         SOCK_STREAM
> >
> 
> O_REUSEADDR,SO_OOBINLINE,SO_SNDBUF(49152),SO_RCVBUF(49
> 640),IP_NEXTHOP(0.0.193.232)
> >         sockname: AF_INET 1.1.1.42  port: 18184
> >         peername: AF_INET 1.1.1.5  port: 55147
> >   
> > [...snip...]
> >
> > So the file being written is fw.log which is to be
> expected. But the reader is fd 77
> 2009-06-08_215011_153.log closely followed by fd’s 70
> & 78 which are tcp connections to log reporting
> server. So an educated guess would say that this is
> the file that reporting server is trying to index.
> Checking the reporting server confirms this.
> >
> > regards
> >
> > Matthew Flanagan
> > --
> > http://wadofstuff.blogspot.com
> >   
> 
> _______________________________________________
> perf-discuss mailing list
> perf-discuss@opensolaris.org
-- 
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to