Hi Jim, Thank you. see my update inline.
Thanks. Best Regards, Simon On Fri, Nov 20, 2009 at 11:51 PM, Jim Mauro <james.ma...@sun.com> wrote: > If you're running out of memory, which it appears you are, > you need to profile the memory consumers, and determine if > you have either a memory leak somewhere, or an under-configured > system. Note 16GB is really tiny by todays standards, especially for > an M5000-class server. It's like putting an engine from a Ford sedan > into an 18-wheel truck - the capacity to do work is severely limited > by a lack of towing power. Laptops ship with 8GB these days... > > Back to memory consumers. We have; > - The kernel > - User processes > - The file system cache (which is technically part of the kernel, > but significant enough such that it should be measured > seperately. > > If the database on a file system, and if so, which one (UFS? ZFS, > VxFS?). How much shared memory is really being used > (ipcs -a)? > Just UFS used.here's the ouput of "ipcs -a": > If the system starts off well, and degrades over time, then you need > to capture memory data over time and see what area is growing. > Based on that data, we can determine if something is leaking memory, > or you have an underconfigured machine. > > I would start with; > echo "::memstat" | mdb -k > ipcs -a > # ipcs -a IPC status from <running system> as of Thu Nov 12 12:05:28 HKT 2009 T ID KEY MODE OWNER GROUP CREATOR CGROUP CBYTES QNUM QBYTES LSPID LRPID STIME RTIME CTIME Message Queues: T ID KEY MODE OWNER GROUP CREATOR CGROUP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME Shared Memory: m 3 0xe9032d40 --rw------- sybase staff sybase staff 3 738803712 1314 2125 20:47:22 no-entry 20:47:14 m 2 0x51 --rw-rw-r-- root root root root 1 2000196 2122 8553 15:15:18 15:15:23 13:14:38 m 1 0x50 --rw-rw-r-- root root root root 1 600196 2121 2121 13:14:38 no-entry 13:14:38 m 0 0xe9032d32 --rw------- sybase staff sybase staff 3 7851147264 1314 2125 13:14:40 13:14:40 13:13:42 T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS OTIME CTIME Semaphores: s 1 0x51 --ra-ra-ra- root root root root 6 12:05:28 13:14:38 s 0 0x50 --ra-ra-ra- root root root root 6 12:05:28 13:14:38 # ipcs -mb (after adjust the share memory define in "/etc/system" from 0xfffffffff to 0x20000000) IPC status from <running system> as of Thu Nov 19 16:38:17 HKT 2009 T ID KEY MODE OWNER GROUP SEGSZ Shared Memory: m 2 0x51 --rw-rw-r-- root root 2000196 m 1 0x50 --rw-rw-r-- root root 600196 m 0 0xe9032d32 --rw------- sybase staff 8548687872 > ps -eo pid,vsz,rss,class,pri,fname,args > prstat -c 1 30 > >From the "prstat" output,we found 3 sybase process,and each process derived 12 threads,the java process(launched by customer application) derived total 370 threads, I think it's too many threads(especially of "java" program) that generate excessive stack/heaps,and finally used up the RAM ? So I think decrease the share memory used by sybase(defined at sybase configuration layer,not in "/etc/system" file) would be helpful ? > kstat -n system_pages > I capatured the system_pages usage for about 0.5hr,one piece looks as below: Mon Nov 16 17:24:25 2009 module: unix instance: 0 name: system_pages class: pages availrmem 857798 crtime 89.53186 desfree 15914 desscan 8972 econtig 188874752 fastscan 1002870 freemem 30730 kernelbase 16777216 lotsfree 31828 minfree 7957 nalloc 66478696 nalloc_calls 19381 nfree 55736969 nfree_calls 14546 nscan 5520 pagesfree 30730 pageslocked 1169036 pagestotal 2037012 physmem 2058547 pp_kernel 189372 slowscan 100 snaptime 359704.2493636 > > You need to collect that data and some regular interval > with timestamps. The interval depends on how long it takes > the machine to degrade. If the systems goes from fresh boot to > degraded state in 1 hour, I'd collect the data every second. > If the machine goes from fresh boot to degraded state in 1 week, > I'd grab the data every 2 hours or so. > > /jim > > > Simon wrote: > >> Hi Experts, >> >> Here's the performance related question,please help to review what can I >> do to get the issue fixed ? >> >> IHAC who has one M5000 with Solaris 10 10/08(KJP: 138888-01) installed >> and 16GB RAM configured,running sybase ASE 12.5 and JBOSS >> application,recently,they felt the OS got very slow after OS running for >> some sime,collected vmstat data points out memory shortage,as: >> >> # vmstat 5 >> kthr memory page disk faults cpu >> r b w swap free re mf pi po fr de sr m0 m1 m4 m5 in sy cs us sy id >> 0 0 153 6953672 254552 228 228 1843 1218 1687 0 685 3 2 0 0 2334 32431 >> 3143 1 1 97 >> 0 0 153 6953672 259888 115 115 928 917 917 0 264 0 35 0 2 2208 62355 3332 >> 7 3 90 >> 0 0 153 6953672 255688 145 145 1168 1625 1625 0 1482 0 6 1 0 2088 40113 >> 3070 2 1 96 >> 0 0 153 6953640 256144 111 111 894 1371 1624 0 1124 0 6 0 0 2080 55278 >> 3106 3 3 94 >> 0 0 153 6953640 256048 241 241 1935 2585 3035 0 1009 0 18 0 0 2392 40643 >> 3164 2 2 96 >> 0 0 153 6953648 257112 236 235 1916 1710 1710 0 1223 0 7 0 0 2672 62582 >> 3628 3 4 93 >> >> As above,the "w" column is very high all time,and "sr" column also kept >> very high,which indicates the page scanner is activated and busying for >> page out,but the CPU is very idle,checked "/etc/system",found one >> improper entry: >> set shmsys:shminfo_shmmax = 0xffffffffffff >> >> So I think it's the improper share memory setting to cause too many >> physical RAM was reserved by application and suggest to adjustment the >> share memory to 8GB(0x200000000),but as customer feedback,seems it got >> worst result based on new vmstat output: >> >> kthr memory page disk faults cpu >> r b w swap free re mf pi po fr de sr m0 m1 m4 m5 in sy cs us sy id >> 0 6 762 3941344 515848 18 29 4544 0 0 0 0 4 562 0 1 2448 25687 3623 1 2 97 >> 0 6 762 4235016 749616 66 21 4251 2 2 0 0 0 528 0 0 2508 50540 3733 2 5 93 >> 0 6 762 4428080 889864 106 299 4694 0 0 0 0 1 573 0 7 2741 182274 3907 10 >> 4 86 >> 0 5 762 4136400 664888 19 174 4126 0 0 0 0 6 511 0 0 2968 241186 4417 18 9 >> 73 >> 0 7 762 3454280 193776 103 651 2526 3949 4860 0 121549 11 543 0 5 2808 >> 149820 4164 10 12 78 >> 0 9 762 3160424 186016 61 440 1803 7362 15047 0 189720 12 567 0 5 3101 >> 119895 4125 6 13 81 >> 0 6 762 3647456 403056 44 279 4260 331 331 0 243 10 540 0 3 2552 38374 >> 3847 5 3 92 >> >> the "w" & "sr" value increased instead,why ? >> >> And I also attached the "prstat" outout,it's a prstat snapshot after >> share memory adjustment,please help to have a look ? what can I do next >> to get the issue solved ? what's the possible factors to cause memory >> shortage again and again,even they have 16GB RAM + 16GB Swap the physical >> RAM really shortage? >> Or is there any useful dtrace script to trace the problem ? Thanks very >> much ! >> >> Best Regards, >> Simon >> >> >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-disc...@opensolaris.org >> >> >
_______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org