Re: [perf-discuss] Memory

elkhaoul elkhaoul Thu, 05 Mar 2009 08:03:57 -0800

Mike,
 
Thanks for your answers, 
 
I did the following,
 
--- En date de : Jeu 5.3.09, Mike Gerdts <mger...@gmail.com> a écrit :



De: Mike Gerdts <mger...@gmail.com>
Objet: Re: [perf-discuss] Memory
À: "elkhaoul elkhaoul" <elkha...@yahoo.fr>
Cc: perf-discuss@opensolaris.org
Date: Jeudi 5 Mars 2009, 15h56


On Thu, Mar 5, 2009 at 8:31 AM, elkhaoul elkhaoul <elkha...@yahoo.fr> wrote:
> Hello,
>
>
> Application running on Solaris10 use the all memory (100%) more than 80GB
> (The machine has only 8Gb of physical memory and 16 of swap),
>
> #### prstat -a
> .............
>  NPROC USERNAME  SIZE   RSS MEMORY      TIME
> CPU
>    142 root     2296M  181M   0.2%  39:53:02 7.6%
>    348 ora10g     99G   83G   100%  61:40:23 5.3%

I bet that is an Oracle 10g database.  Oracle uses a large shared
memory area called the "system global area" or sga.  You likely have
tens to hundreds of processes all mapping this same shared memory
segment and as such each appears to be using that amount of memory.
The rather ancient patch release of Solaris 10 that you are using is
not able to recognize that it is counting the same memory many times
and as such gives you misleading information.

FWIW, I've seen other systems that claim to be using well over a
terabyte of memory even though they *only* had a couple hundred
gigabytes.  I believe that it was in the Solaris 10 update 4 time
frame that the way that memory is accounted for changed.  Now oracle
on the same system appears to be using less than 100 GB (again that is
a minority of the RAM on the box).  The new accounting appears to not
include the sga, so there is a trade-off here with neither being a
completely accurate representation.

>
> #### sar -g 10
> SunOS epsu17 5.10 Generic_118822-26 sun4u    03/05/2009
> 15:25:52  pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
> 15:26:02    23.16    38.57    36.68   358.95     0.00

The pgscan/s certainly looks like you have some memory pressure.

> How can I find debug this application ? the processing blocking the memory ?

You can use "vmstat -p 10" to get a better idea of what is causing the
paging (executable, anonymous, filesystem).

 

==> 
there is the output of "vmpstat -p 10"  what does mean the columns (executable, 
anonymous and filesystem) ?  
 
 
          page          executable      anonymous      filesystem 
   swap  free  re  mf  fr  de  sr  epi  epo  epf  api  apo  apf  fpi  fpo  fpf
 14305784 800112 1039 3566 261 49640 778 16 0  2  132  171  194 2161   43   65
 4464168 129784 898 7377 39 17328 0  4    0    0   80    0    0  102   39   39
 4465352 131856 675 5695 439 6056 818 1   0    0  434  426  426   66    6   14
 4467112 129040 891 7550 773 2144 1141 1  0    0  411  675  675   76  101   98
 4463784 129720 719 5902 411 456 100 33   0    0  252  295  295   97  123  116
 4466520 127680 826 6623 1131 0 1312 2    0    0  742  980  980   78  145  151
 4469656 126968 973 7752 1385 0 1526 10   0   14 1096  920  948  561  269  424

 
 
You can use "ipcs -ma" to see the shared memory segments, there sizes,
and how many processes have them mapped (nattach).  A segment owned by
ora10g that is kinda big with a large number of processes attached is
likely the sga.

==> 
I find a lot of shared memory segments with a big SGSIZE, (10000340 , 
629153792, ...).
 
Is it in bytes ? is it in physical memory ?
 
 #### ipcs -ma
IPC status from <running system> as of Thu Mar  5 16:45:44 CET 2009
T         ID      KEY        MODE        OWNER    GROUP  CREATOR   CGROUP 
NATTCH      SEGSZ  CPID  LPID   ATIME    DTIME  
  CTIME 
Shared Memory:
m         30   0x31004002 --rw-rw-rw-     root     root     root     root      
3     131164 16732 18865 15:58:19 16:45:43 
10:38:15
m         29   0x6347849  --rw-rw-rw-     root     root     root     root      
1      65566 16731 18865 10:38:23 16:45:43 
10:38:15
.................
 
10:31:25
m         17   0          --rw-------   ora10g      dba   ora10g      dba     
18   10000340 14557 17426 15:11:18 15:11:18 
10:31:25
m         16   0          --rw-------   ora10g      dba   ora10g      dba     
18       9308 14557 17426 15:11:18 15:11:18 
10:31:25
m         15   0          --rw-------   ora10g      dba   ora10g      dba     
18      35692 14557 17426 15:11:18 15:11:18 
10:30:45
m         14   0          --rw-------   ora10g      dba   ora10g      dba     
18     201860 14557 17426 15:11:18 15:11:18 
10:30:45
m         13   0          --rw-------   ora10g      dba   ora10g      dba     
18      32796 14557 17426 15:11:18 15:11:18 
10:30:45
m         12   0          --rw-------   ora10g      dba   ora10g      dba     
18     616924 14557 17426 15:11:18 15:11:18 
10:30:45
m         11   0          --rw-------   ora10g      dba   ora10g      dba     
18      93652 14557 17426 15:11:18 15:11:18 
10:30:45
m         10   0          --rw-------   ora10g      dba   ora10g      dba     
18    1730480 14557 17426 15:11:18 15:11:18 
10:30:45
m          9   0          --rw-------   ora10g      dba   ora10g      dba      
2       5520 14509  8182 14:19:10 14:19:10 
10:30:37
m          8   0          --rw-------   ora10g      dba   ora10g      dba      
1         12 14508 14508 10:30:37 no-entry 
10:30:37
m          7   0xe8ea8084 --rw-------   ora10g      dba   ora10g      dba      
1        176 12834 12834 10:30:19 no-entry 
10:30:19
m          6   0xff5fe38  --rw-------   ora10g      dba   ora10g      dba      
8       3304  8334  8483 10:29:07 10:29:05 
10:29:04
m          2   0xb478f6a4 --rw-r-----   ora10g      dba   ora10g      dba    
131  629153792  2831 21219 16:43:11 16:44:48 
10:27:23
 
 
 

The DTraceToolkit (ask google) has several useful scripts in the Mem directory.

Things I would be looking at are:

- Is the SGA set bigger than it should be?
- Are you doing a lot of I/O to and from /tmp?
- Are you running things on the box not related to the database?  Look
at "prstat -s rss -c -n 5000 1 1 | grep -v ora10g | head" to find
candidates to relocate to another box.

===>
#####prstat -s rss -c -n 5000 1 1 | grep -v ora10g|head
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
 12745 root      163M   34M sleep   59    0   0:30:55 0.0% java/29
 17003 root       22M 8224K sleep   59    0  27:02:38 1.7% scopeux/1
  1210 root       26M 7792K sleep   59    0   0:05:41 0.0% naviagent/2
 15610 root       16M 6224K sleep   59    0   0:09:12 0.0% opcmona/3
 


-- 
Mike Gerdts
http://mgerdts.blogspot.com/
 
 
Best Regards,

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] Memory

Reply via email to