Hi Phil, Comments inline.
> > > > truss shows writes like this: > > > > /1: write(22, "\0 4\001 :919C\v J H90 b".., 96) > = 96 > : write(23, " %97 *AF", 4) > = 4 > 97 *AF", 4) = 4 > > > > and reads like this: > > /1: read(77, "\0 D\003 :919F16 J ,EACF".., > 131072) = 131072 > > /1: read(77, "\0 D\003 :919F16 J ,EADC".., > 131072) = 131072 > > /1: read(77, "\0 D\003 :919F16 J ,EADC".., > 131072) = 131072 > > /1: read(78, 0x033BD918, 5) > Err#11 EAGAIN > BD918, 5) Err#11 EAGAIN > > /1: read(78, 0x033BD918, 5) > Err#11 EAGAIN > BD918, 5) Err#11 EAGAIN > > > > I had done a truss before and seen the 128k reads. > Though I hadn't noticed so many EAGAIN's. > > > > The writes are tiny. That's a lot of overhead. But I > think we need the > big picture, so some DTrace quntizations of per-fd > sizes might be useful > to fill out the picture some more. # dtrace -n 'sysinfo:::readch /pid ==24937/ { @dist[execname] = quantize(arg0); }' dtrace: description 'sysinfo:::readch ' matched 4 probes ^C fwd value ------------- Distribution ------------- count -1 | 0 0 |@@@@@@@@@@@@@@ 6513 1 | 0 2 | 0 4 |@ 366 8 | 0 16 | 0 32 | 0 64 | 6 128 | 6 256 | 12 512 | 5 1024 | 6 2048 | 13 4096 |@ 246 8192 | 24 16384 |@ 276 32768 | 0 65536 | 0 131072 |@@@@@@@@@@@@@@@@@@@@@@@@ 11372 262144 | 0 # dtrace -n 'syscall::read:entry /pid ==24937/ { @dist[arg0] = quantize(arg2); }' dtrace: description 'syscall::read:entry ' matched 1 probe ^C 69 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 8 8 | 0 16 | 0 32 | 0 64 |@@@@@@@@@@ 3 128 |@@@ 1 256 | 0 65 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6 8 | 0 16 | 0 32 | 0 64 |@@@@ 1 128 |@@@@ 1 256 |@@@@ 1 512 | 0 66 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14 8 | 0 16 | 0 32 | 0 64 |@@ 1 128 |@@@@@@@@ 4 256 |@@@@ 2 512 | 0 67 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 16 8 | 0 16 | 0 32 | 0 64 |@@ 1 128 |@@ 1 256 |@@@ 2 512 | 0 1024 |@@@@@@@ 4 2048 | 0 64 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@ 11 8 | 0 16 | 0 32 | 0 64 |@@@@@ 2 128 |@@ 1 256 |@@ 1 512 | 0 1024 | 0 2048 |@@ 1 4096 | 0 8192 |@@ 1 16384 | 0 71 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3216 8 | 0 43 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3233 8 | 0 54 value ------------- Distribution ------------- count 8192 | 0 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 16 32768 | 0 55 value ------------- Distribution ------------- count 65536 | 0 131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4 262144 | 0 68 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@ 119 8 | 0 16 | 0 32 | 0 64 | 0 128 | 0 256 | 0 512 | 0 1024 |@ 4 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@ 107 8192 |@ 4 16384 |@@ 10 32768 | 0 61 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@ 114 8 | 0 16 | 0 32 | 0 64 | 0 128 |@ 4 256 | 0 512 | 0 1024 | 2 2048 |@ 4 4096 |@@@@@@@@@@@@@@@ 90 8192 |@ 6 16384 |@@@@ 22 32768 | 0 62 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@ 110 8 | 0 16 | 0 32 | 0 64 | 2 128 | 0 256 | 0 512 | 3 1024 |@ 4 2048 | 2 4096 |@@@@@@@@ 47 8192 |@@@ 17 16384 |@@@@@@@@@ 57 32768 | 0 73 value ------------- Distribution ------------- count 65536 | 0 131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10604 262144 | 0 interesting fd's from `pfile 24937`. Note that external reporting engine is now indexing the active log file not an archived one: 43: S_IFSOCK mode:0666 dev:332,0 ino:23157 uid:0 gid:0 size:0 O_RDWR|O_NDELAY SOCK_STREAM SO_REUSEADDR,SO_OOBINLINE,SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232) sockname: AF_INET 1.1.1.42 port: 18184 peername: AF_INET 1.1.1.5 port: 45277 71: S_IFSOCK mode:0666 dev:332,0 ino:63336 uid:0 gid:0 size:0 O_RDWR|O_NDELAY SOCK_STREAM SO_REUSEADDR,SO_OOBINLINE,SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232) sockname: AF_INET 1.1.1.42 port: 18184 peername: AF_INET 1.1.1.5 port: 46653 73: S_IFREG mode:0644 dev:256,65541 ino:148301 uid:0 gid:0 size:1785772291 O_RDONLY /var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/fw.log Writes appear to be quite small on the whole. fd 22 here is fw.log, 23 fw.logptr, 43 & 71 TCP connections to log reporting server. # dtrace -n 'syscall::write:entry /pid ==24937/ { @dist[arg0] = quantize(arg2); }' dtrace: description 'syscall::write:entry ' matched 1 probe ^C 13 value ------------- Distribution ------------- count 512 | 0 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4 2048 | 0 4096 |@@@@@@@@ 1 8192 | 0 24 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 15365 8 | 0 23 value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 17146 8 | 0 71 value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@ 22 16384 |@@@@@@@@@@@@@@@@@@@@ 22 32768 | 0 43 value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@ 22 16384 |@@@@@@@@@@@@@@@@@@@@ 23 32768 | 0 22 value ------------- Distribution ------------- count 2 | 0 4 |@@@@ 1781 8 | 0 16 | 0 32 | 21 64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 15806 128 |@@@ 1319 256 | 0 > > Those EAGAINs are a concern. It would be good to get > a handle on how > much that happens, but it really points to code that > needs fixing. > Unconstrained polling is a big no-no. I get the > impression that this > code isn't up to the kind of load being put through > it. I'd like to see > multiple threads and/or the use of poll(2) or event > ports. > > >> 2) You've said the application is single threaded. > >> This isn't a good fit > >> for the T-series (which was designed for > throughput > >> workloads at the > >> expense of single thread performance ... e.g. the > >> M-series has typically > >> 4x the single thread performance the T-series). > >> > > > > The server has handled the workload quite well > until recently when we've added a few new firewalls > that do quite a large volume of logging. If I can > build a case for a faster server then the easier it > is to convince the business to buy it. > > > > The application has a finite capacity. It will max > out sooner or later. > Faster CPUs might put off the evil day, but the code > really needs fixing > too. Your T5120 has plenty of capacity, but not with > this code, and your > prstat data confirms ... > > >> 3) It would be helpful to see 'prstat -Lm' to get > a > >> better overall feel > >> for what is running where (especially, where fwd > is > >> spending its time). > >> > > > > PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX > ICX SCL SIG PROCESS/LWPID > > 2153 root 41 59 0.1 0.0 0.0 0.0 0.3 0.2 383 > 11 7K 0 fwd/1 > > at's one whole VCPU dedicated to this task. Either we > improve the code > (e.g. save cycles) or you need a faster CPU with > respec to single thread > performance. Of course, if we could also make the > application > multithreaded or multiprocess the T5120 has plenty of > CPU to spare! I agree that putting in faster CPu will help it is only putting things off. I'll be approaching the vendor to discuss their application's performance but building a case such as this will help me get the point across. > > > > > unix`send_mondo_set+0x314 > > unix`xt_some+0x1a0 > > unix`sfmmu_tlb_range_demap+0x1f0 > > unix`hat_unload_callback+0x850 > > unix`ppmapout+0x94 > > zfs`mappedread+0xa4 > > zfs`zfs_read+0x19c > > genunix`fop_read+0x20 > > genunix`read+0x274 > > unix`syscall_trap32+0xcc > > 135480 > > > > unix`send_mondo_set+0x314 > > unix`xt_some+0x1a0 > > unix`xt_sync+0x158 > > unix`hat_unload_callback+0x898 > > unix`ppmapout+0x94 > > zfs`mappedread+0xa4 > > zfs`zfs_read+0x19c > > genunix`fop_read+0x20 > > genunix`read+0x274 > > unix`syscall_trap32+0xcc > > 135480 > > > > Ah! This is very telling since zfs_read() only calls > mappedread() if it > thinks someone also has the file mmapped. ZFS and > mmap(2) do not mix > well because ZFS has its own cache (the ARC) apart > from the Solaris page > cache. However, mmap(2) currently uses the Solaris > page cache for mapped > ZFS files, so ZFS has to keep the two caches in sync > (so everyone sees > the file in the same state). The crosscall storm is > the result. > > I worked on a partial fix for mapped read > performance, but I doubt that > will help your application much (because it wasn't > aimed at read() > performance of an elsewhere mapped file). > > So the big question is: who has the the file mapped, > and can you stop > that? Note: utilities like cat(1) and cp(1) use > mmap(2) to read data, > but dd(1) doesn't (e.g. if you had a script that > copied the files > periodically, you could substitute dd(1) and lose the > crosscalls). > > Does fwd mmap the file itself? (you can use pmap(1) > to find out). > It appears that fwd is mmapping the files. So as a temporary measure would moving the logs to UFS improve things? I'll be applying latest recommended patches but another poster, sistare, seemed to say that the fix for the ZFS mmap issue (6699438) has not been released yet. So I'll have to open a case with Sun to get hold of that myself. # pmap 24937 24937: /opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/bin/fwd -n [snip] EBB9C000 39544K r--s- dev:256,65541 ino:148301 --- fw.log EE5DA000 41424K r--s- dev:256,65541 ino:125472 --- 2009-06-29_224738_25.log [snip] total 169976K > > ibCPLogRepository.so`__1cOCBinaryLogFileNReadLogHeader > 6kMpnJILogEvent_pnNCLogFileChain__b_+0x280 > > > > ibCPLogRepository.so`__1cOCBinaryLogFileHReadLog6MpnJI > LogEvent_pnNCLogFileChain__b_+0xc > > > > ibCPLogRepository.so`__1cICLogFilePReadLogByOffset6Mpn > JILogEvent_LpL3_i_+0x118 > > > > ibCPLogLuFile.so`__1cKCLuLogFileLgatherChain6Mn0APtRea > dNavigation_kLrnKCSrvVector4CpnJILogEvent_nDstdTalloca > tor_interface4n0DJallocator4C4__C4____r4_i_+0x3f8 > > > > ibCPLogLuFile.so`__1cKCLuLogFileJdoReadLog6MpnJILogEve > nt_n0APtReadNavigation_kLipp2pi_i_+0x80 > > libCPLogLuFile.so`0xfc493e9c > > > > ibfw1.so`__1cNlog_read_next6FpnPICplogLuLogFile_pnJILo > gEvent_bpp3pi6_3_+0xfc > > libfwdopsec.so`lea_send_line+0x108 > > libopsec.so`lea_send_50_records+0x68 > > > > ibopsec.so`lea_server_event_handler+0xc8 > > > > ibopsec.so`opsec_fwasync_conn_handler+0x2fc > > libComUtils.so`0xfe393bc4 > > > > ibComUtils.so`T_event_mainloop_iter+0x4c4 > > libComUtils.so`T_event_mainloop+0x118 > > libfw1.so`fwd_cmain+0x131c > > libfw1.so`fw_cmain+0x910 > > fwd`_start+0x108 > > 226302 > > > > > >> Hope this helps, > >> Phil > >> > > > > I hope so too. And thanks for the fast response! > > > > regards > > > > Matthew > > > > Nothing else there, except that taking an xcall > during a memcpy() might > indicate a mapped read. However, xcalls can happen in > many ways. A > PID-predicated stack aggregation of the xcalls might > shed a little more > light ... > > dtrace -n 'sysinfo:::xcalls /pid == 2153/ { > @[stack()] = count(); }' > # dtrace -n 'sysinfo:::xcalls /pid == 24937/ { @[stack()] = count(); }' unix`send_one_mondo+0x1c unix`xt_one_unchecked+0xc8 genunix`sleepq_wakeone_chan+0x70 genunix`cv_signal+0x58 genunix`taskq_bucket_dispatch+0x74 genunix`taskq_dispatch+0x13c genunix`qenable_locked+0xd4 genunix`backenable+0x94 genunix`kstrgetmsg+0xa80 sockfs`sotpi_recvmsg+0x2ac sockfs`socktpi_read+0x44 genunix`fop_read+0x20 genunix`read+0x274 unix`syscall_trap32+0xcc 23 unix`send_mondo_set+0x314 unix`xt_some+0x1a0 unix`sfmmu_tlb_range_demap+0x1f0 unix`hat_unload_callback+0x850 unix`ppmapout+0x94 zfs`zfs_fillpage+0x19c zfs`zfs_getpage+0x184 genunix`fop_getpage+0x44 genunix`segvn_fault+0xb04 genunix`as_fault+0x4c8 unix`pagefault+0x68 unix`trap+0xd50 unix`utl0+0x4c 157 unix`send_mondo_set+0x314 unix`xt_some+0x1a0 unix`xt_sync+0x158 unix`hat_unload_callback+0x898 unix`ppmapout+0x94 zfs`zfs_fillpage+0x19c zfs`zfs_getpage+0x184 genunix`fop_getpage+0x44 genunix`segvn_fault+0xb04 genunix`as_fault+0x4c8 unix`pagefault+0x68 unix`trap+0xd50 unix`utl0+0x4c 157 unix`send_mondo_set+0x314 unix`xt_some+0x1a0 unix`sfmmu_tlb_range_demap+0x1f0 unix`hat_unload_callback+0x850 unix`ppmapout+0x94 zfs`mappedread+0xa4 zfs`zfs_read+0x19c genunix`fop_read+0x20 genunix`read+0x274 unix`syscall_trap32+0xcc 176138 unix`send_mondo_set+0x314 unix`xt_some+0x1a0 unix`xt_sync+0x158 unix`hat_unload_callback+0x898 unix`ppmapout+0x94 zfs`mappedread+0xa4 zfs`zfs_read+0x19c genunix`fop_read+0x20 genunix`read+0x274 unix`syscall_trap32+0xcc 176138 > > Phil > _______________________________________________ > perf-discuss mailing list > perf-discuss@opensolaris.org regards Matthew -- This message posted from opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org