Mike, Thanks for your answers, I did the following, --- En date de : Jeu 5.3.09, Mike Gerdts <mger...@gmail.com> a écrit :
De: Mike Gerdts <mger...@gmail.com> Objet: Re: [perf-discuss] Memory À: "elkhaoul elkhaoul" <elkha...@yahoo.fr> Cc: perf-discuss@opensolaris.org Date: Jeudi 5 Mars 2009, 15h56 On Thu, Mar 5, 2009 at 8:31 AM, elkhaoul elkhaoul <elkha...@yahoo.fr> wrote: > Hello, > > > Application running on Solaris10 use the all memory (100%) more than 80GB > (The machine has only 8Gb of physical memory and 16 of swap), > > #### prstat -a > ............. > NPROC USERNAME SIZE RSS MEMORY TIME > CPU > 142 root 2296M 181M 0.2% 39:53:02 7.6% > 348 ora10g 99G 83G 100% 61:40:23 5.3% I bet that is an Oracle 10g database. Oracle uses a large shared memory area called the "system global area" or sga. You likely have tens to hundreds of processes all mapping this same shared memory segment and as such each appears to be using that amount of memory. The rather ancient patch release of Solaris 10 that you are using is not able to recognize that it is counting the same memory many times and as such gives you misleading information. FWIW, I've seen other systems that claim to be using well over a terabyte of memory even though they *only* had a couple hundred gigabytes. I believe that it was in the Solaris 10 update 4 time frame that the way that memory is accounted for changed. Now oracle on the same system appears to be using less than 100 GB (again that is a minority of the RAM on the box). The new accounting appears to not include the sga, so there is a trade-off here with neither being a completely accurate representation. > > #### sar -g 10 > SunOS epsu17 5.10 Generic_118822-26 sun4u 03/05/2009 > 15:25:52 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf > 15:26:02 23.16 38.57 36.68 358.95 0.00 The pgscan/s certainly looks like you have some memory pressure. > How can I find debug this application ? the processing blocking the memory ? You can use "vmstat -p 10" to get a better idea of what is causing the paging (executable, anonymous, filesystem). ==> there is the output of "vmpstat -p 10" what does mean the columns (executable, anonymous and filesystem) ? page executable anonymous filesystem swap free re mf fr de sr epi epo epf api apo apf fpi fpo fpf 14305784 800112 1039 3566 261 49640 778 16 0 2 132 171 194 2161 43 65 4464168 129784 898 7377 39 17328 0 4 0 0 80 0 0 102 39 39 4465352 131856 675 5695 439 6056 818 1 0 0 434 426 426 66 6 14 4467112 129040 891 7550 773 2144 1141 1 0 0 411 675 675 76 101 98 4463784 129720 719 5902 411 456 100 33 0 0 252 295 295 97 123 116 4466520 127680 826 6623 1131 0 1312 2 0 0 742 980 980 78 145 151 4469656 126968 973 7752 1385 0 1526 10 0 14 1096 920 948 561 269 424 You can use "ipcs -ma" to see the shared memory segments, there sizes, and how many processes have them mapped (nattach). A segment owned by ora10g that is kinda big with a large number of processes attached is likely the sga. ==> I find a lot of shared memory segments with a big SGSIZE, (10000340 , 629153792, ...). Is it in bytes ? is it in physical memory ? #### ipcs -ma IPC status from <running system> as of Thu Mar 5 16:45:44 CET 2009 T ID KEY MODE OWNER GROUP CREATOR CGROUP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME Shared Memory: m 30 0x31004002 --rw-rw-rw- root root root root 3 131164 16732 18865 15:58:19 16:45:43 10:38:15 m 29 0x6347849 --rw-rw-rw- root root root root 1 65566 16731 18865 10:38:23 16:45:43 10:38:15 ................. 10:31:25 m 17 0 --rw------- ora10g dba ora10g dba 18 10000340 14557 17426 15:11:18 15:11:18 10:31:25 m 16 0 --rw------- ora10g dba ora10g dba 18 9308 14557 17426 15:11:18 15:11:18 10:31:25 m 15 0 --rw------- ora10g dba ora10g dba 18 35692 14557 17426 15:11:18 15:11:18 10:30:45 m 14 0 --rw------- ora10g dba ora10g dba 18 201860 14557 17426 15:11:18 15:11:18 10:30:45 m 13 0 --rw------- ora10g dba ora10g dba 18 32796 14557 17426 15:11:18 15:11:18 10:30:45 m 12 0 --rw------- ora10g dba ora10g dba 18 616924 14557 17426 15:11:18 15:11:18 10:30:45 m 11 0 --rw------- ora10g dba ora10g dba 18 93652 14557 17426 15:11:18 15:11:18 10:30:45 m 10 0 --rw------- ora10g dba ora10g dba 18 1730480 14557 17426 15:11:18 15:11:18 10:30:45 m 9 0 --rw------- ora10g dba ora10g dba 2 5520 14509 8182 14:19:10 14:19:10 10:30:37 m 8 0 --rw------- ora10g dba ora10g dba 1 12 14508 14508 10:30:37 no-entry 10:30:37 m 7 0xe8ea8084 --rw------- ora10g dba ora10g dba 1 176 12834 12834 10:30:19 no-entry 10:30:19 m 6 0xff5fe38 --rw------- ora10g dba ora10g dba 8 3304 8334 8483 10:29:07 10:29:05 10:29:04 m 2 0xb478f6a4 --rw-r----- ora10g dba ora10g dba 131 629153792 2831 21219 16:43:11 16:44:48 10:27:23 The DTraceToolkit (ask google) has several useful scripts in the Mem directory. Things I would be looking at are: - Is the SGA set bigger than it should be? - Are you doing a lot of I/O to and from /tmp? - Are you running things on the box not related to the database? Look at "prstat -s rss -c -n 5000 1 1 | grep -v ora10g | head" to find candidates to relocate to another box. ===> #####prstat -s rss -c -n 5000 1 1 | grep -v ora10g|head PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 12745 root 163M 34M sleep 59 0 0:30:55 0.0% java/29 17003 root 22M 8224K sleep 59 0 27:02:38 1.7% scopeux/1 1210 root 26M 7792K sleep 59 0 0:05:41 0.0% naviagent/2 15610 root 16M 6224K sleep 59 0 0:09:12 0.0% opcmona/3 -- Mike Gerdts http://mgerdts.blogspot.com/ Best Regards,
_______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org