Well, Here it is:
----------------start_jconsole.sh----------- #!/bin/sh export DISPLAY DISPLAY_NUMBER=`echo $DISPLAY | /usr/bin/perl -n -e 'm#localhost:(\d+).0# && print $1'` echo $DISPLAY_NUMBER XAUTH_ENTRY=`/usr/bin/xauth list | grep "unix:$DISPLAY_NUMBER"` sudo su webinf -c "xauth add $XAUTH_ENTRY; /usr/bin/jconsole -J-Djava.class.path=/usr/lib/jvm/java/lib/jconsole.jar:/usr/lib/jvm/java /lib/tools.jar" ---------------------------------------------------- On Wed, Oct 3, 2012 at 11:07 AM, Shanti Suresh <sha...@umich.edu> wrote: > Hi Kirill, > > I am amazed at the size of the Java Heap of 26 GB. I didn't expect that > big a size. Before I suggest anything else, please take a look at the > filesystem space on "/tmp". The script creates a heapdump everytime it > runs as well. So please make sure that "/tmp/" has sufficient storage. > Otherwise it may bring your service down. > > (1) Could you please paste your JVM command-line arguments here? > (2) Also, did you have a chance to observe the JConsole graphs? I have > attached a sample JConsole graph here. > (3) What JDK version are you using? > (4) GC for a Heap of that size may need some tuning. > > You may start JConsole as follows on Linux/Unix. > a.Place the following script in your home-directory on the Tomcat > app-server machine . > b. Make sure that you have enabled Tomcat to allow JMX access. > c. Then log-in from a remote laptop/desktop into the server (using ssh), > and invoke this script from your home-directory: > --------start_jconsole.sh------ > DISPLAY_NUMBER=`echo $DISPLAY | /usr/bin/perl -n -e 'm#localhost:(\d+).0# > && print $1'` > echo $DISPLAY_NUMBER > XAUTH_ENTRY=`/usr/bin/xauth list | grep "unix:$DISPLAY_NUMBER"` > /usr/bin/jconsole > -J-Djava.class.path=/usr/lib/jvm/java/lib/jconsole.jar:/usr/lib/jvm/java/lib/tools.jar > ------------------------------- > > > [image: Inline image 1] > > Thanks. > > -Shanti > > > On Tue, Oct 2, 2012 at 7:43 PM, Kirill Kireyev <kir...@instagrok.com>wrote: > >> Hi Shanti! Thanks for your script! I ran it (with minor modifications), >> and it turns out that the garbage collection threads are indeed taking up >> the majority of CPU (see below). What would you recommend I do to >> tune/troubleshoot GC further? My server has 32G or RAM, JVM has heap size >> of 26G >> >> Thanks, >> Kirill >> >> PID CPU Mem JStack Info >> 0x253d 80 90.7 "GC task thread#3 (ParallelGC)" prio=10 >> tid=0x0000000001938800 nid=0x253d runnable >> 0x2540 76 90.7 "GC task thread#6 (ParallelGC)" prio=10 >> tid=0x000000000193e000 nid=0x2540 runnable >> 0x253b 74 90.7 "GC task thread#1 (ParallelGC)" prio=10 >> tid=0x0000000001934800 nid=0x253b runnable >> 0x253e 74 90.7 "GC task thread#4 (ParallelGC)" prio=10 >> tid=0x000000000193a000 nid=0x253e runnable >> 0x2541 74 90.7 "GC task thread#7 (ParallelGC)" prio=10 >> tid=0x000000000193f800 nid=0x2541 runnable >> 0x253c 66 90.7 "GC task thread#2 (ParallelGC)" prio=10 >> tid=0x0000000001936800 nid=0x253c runnable >> 0x253a 47 90.7 "GC task thread#0 (ParallelGC)" prio=10 >> tid=0x0000000001933000 nid=0x253a runnable >> 0x253f 39 90.7 "GC task thread#5 (ParallelGC)" prio=10 >> tid=0x000000000193c000 nid=0x253f runnable >> 0x2542 19 90.7 "VM Thread" prio=10 tid=0x00000000019b6000 >> nid=0x2542 runnable >> 0x2539 0 90.7 "main" prio=10 tid=0x0000000001925000 nid=0x2539 >> runnable [0x00007fee9d444000] >> 0x2543 0 90.7 "Reference Handler" daemon prio=10 >> tid=0x00000000019bd800 nid=0x2543 runnable [0x00007fee500d8000] >> 0x2544 0 90.7 "Finalizer" daemon prio=10 tid=0x00000000019c5800 >> nid=0x2544 in Object.wait() [0x00007fee4ffd7000] >> 0x2545 0 90.7 "Signal Dispatcher" daemon prio=10 >> tid=0x0000000001a12000 nid=0x2545 runnable [0x0000000000000000] >> 0x2546 0 90.7 "C2 CompilerThread0" daemon prio=10 >> tid=0x00007fee48001000 nid=0x2546 waiting on condition [0x0000000000000000] >> 0x2547 0 90.7 "C2 CompilerThread1" daemon prio=10 >> tid=0x00007fee48004000 nid=0x2547 waiting on condition [0x0000000000000000] >> 0x2548 0 90.7 "Service Thread" daemon prio=10 >> tid=0x0000000001a14800 nid=0x2548 runnable [0x0000000000000000] >> 0x2549 0 90.7 "VM Periodic Task Thread" prio=10 >> tid=0x0000000001a1f800 nid=0x2549 waiting on condition >> 0x254a 0 90.7 "GC Daemon" daemon prio=10 tid=0x00007fee404b3000 >> nid=0x254a in Object.wait() [0x00007fee4e1bd000] >> 0x254d 0 90.7 >> "ContainerBackgroundProcessor[StandardEngine[Catalina]]" daemon prio=10 >> tid=0x00007fee40389800 nid=0x254d runnable [0x00007fee4d820000] >> 0x254e 0 90.7 "http-apr-8080-Poller-0" daemon prio=10 >> tid=0x00007fee40378000 nid=0x254e in Object.wait() [0x00007fee4d661000] >> 0x254f 0 90.7 "http-apr-8080-CometPoller-0" daemon prio=10 >> tid=0x00007fee40395000 nid=0x254f in Object.wait() [0x00007fee4d4a2000] >> 0x2550 0 90.7 "http-apr-8080-Sendfile-0" daemon prio=10 >> tid=0x00007fee40397000 nid=0x2550 in Object.wait() [0x00007fee4d2b0000] >> 0x2551 0 90.7 "http-apr-8080-Acceptor-0" daemon prio=10 >> tid=0x00007fee4072c000 nid=0x2551 runnable [0x00007fee4d1af000] >> 0x2552 0 90.7 "http-apr-8080-AsyncTimeout" daemon prio=10 >> tid=0x00007fee4072e000 nid=0x2552 sleeping[0x00007fee4d0ac000] >> 0x2553 0 90.7 "http-apr-8443-Poller-0" daemon prio=10 >> tid=0x00007fee40730000 nid=0x2553 in Object.wait() [0x00007fee4cfab000] >> 0x2554 0 90.7 "http-apr-8443-CometPoller-0" daemon prio=10 >> tid=0x00007fee40732000 nid=0x2554 in Object.wait() [0x00007fee4cdee000] >> 0x2555 0 90.7 "http-apr-8443-Acceptor-0" daemon prio=10 >> tid=0x00007fee40734800 nid=0x2555 runnable [0x00007fee4cced000] >> 0x2556 0 90.7 "http-apr-8443-AsyncTimeout" daemon prio=10 >> tid=0x00007fee4073b800 nid=0x2556 sleeping[0x00007fee4cbec000] >> 0x2557 0 90.7 "ajp-apr-8009-Poller-0" daemon prio=10 >> tid=0x00007fee4073d800 nid=0x2557 runnable [0x00007fee4ca2f000] >> 0x2558 0 90.7 "ajp-apr-8009-CometPoller-0" daemon prio=10 >> tid=0x00007fee4073f800 nid=0x2558 in Object.wait() [0x00007fee4c872000] >> 0x2559 0 90.7 "ajp-apr-8009-Acceptor-0" daemon prio=10 >> tid=0x00007fee40741800 nid=0x2559 runnable [0x00007fee4c771000] >> 0x255a 0 90.7 "ajp-apr-8009-AsyncTimeout" daemon prio=10 >> tid=0x00007fee40743800 nid=0x255a sleeping[0x00007fee4c670000] >> 0x255f 0 90.7 "ajp-apr-8009-exec-5" daemon prio=10 >> tid=0x0000000001cef000 nid=0x255f runnable [0x00007fee4c157000] >> 0x2560 0 90.7 "ajp-apr-8009-exec-6" daemon prio=10 >> tid=0x0000000001e93800 nid=0x2560 runnable [0x00007fee47ffd000] >> 0x2561 0 90.7 "ajp-apr-8009-exec-7" daemon prio=10 >> tid=0x0000000001c1e800 nid=0x2561 runnable [0x00007fee4db39000] >> 0x2562 0 90.7 "ajp-apr-8009-exec-8" daemon prio=10 >> tid=0x0000000001d41000 nid=0x2562 runnable [0x00007fee4dc3a000] >> 0x2563 0 90.7 "ajp-apr-8009-exec-9" daemon prio=10 >> tid=0x0000000001bf9800 nid=0x2563 runnable [0x00007fee47efb000] >> 0x2564 0 90.7 "ajp-apr-8009-exec-10" daemon prio=10 >> tid=0x0000000001cfe800 nid=0x2564 waiting on condition [0x00007fee47dfa000] >> 0x2586 0 90.7 "ajp-apr-8009-exec-12" daemon prio=10 >> tid=0x0000000001e85000 nid=0x2586 waiting on condition [0x00007fee47bf6000] >> 0x2587 0 90.7 "pool-1-thread-1" prio=10 tid=0x0000000001fe4000 >> nid=0x2587 waiting on condition [0x00007fee47af3000] >> 0x2635 0 90.7 "ajp-apr-8009-exec-15" daemon prio=10 >> tid=0x000000000232d800 nid=0x2635 runnable [0x00007fee38991000] >> 0x2637 0 90.7 "ajp-apr-8009-exec-17" daemon prio=10 >> tid=0x0000000002392800 nid=0x2637 waiting on condition [0x00007fee38790000] >> 0x2638 0 90.7 "ajp-apr-8009-exec-18" daemon prio=10 >> tid=0x0000000002395000 nid=0x2638 waiting on condition [0x00007fee3868f000] >> 0x263a 0 90.7 "ajp-apr-8009-exec-20" daemon prio=10 >> tid=0x0000000002312800 nid=0x263a waiting on condition [0x00007fee3848b000] >> 0x263d 0 90.7 "ajp-apr-8009-exec-23" daemon prio=10 >> tid=0x0000000002324800 nid=0x263d runnable [0x00007fee38187000] >> 0x50b 0 90.7 "http-apr-8443-exec-1" daemon prio=10 >> tid=0x0000000002209000 nid=0x50b waiting on condition [0x00007fee232b1000] >> 0x50c 0 90.7 "http-apr-8443-exec-2" daemon prio=10 >> tid=0x000000000213c000 nid=0x50c waiting on condition [0x00007fee233b2000] >> 0x50d 0 90.7 "http-apr-8443-exec-3" daemon prio=10 >> tid=0x000000000239e000 nid=0x50d waiting on condition [0x00007fee22baa000] >> 0x50e 0 90.7 "http-apr-8443-exec-4" daemon prio=10 >> tid=0x0000000002aeb800 nid=0x50e waiting on condition [0x00007fee3939d000] >> 0x50f 0 90.7 "http-apr-8443-exec-5" daemon prio=10 >> tid=0x0000000002ad3000 nid=0x50f waiting on condition [0x00007fee38f99000] >> 0x510 0 90.7 "http-apr-8443-exec-6" daemon prio=10 >> tid=0x00000000025a1800 nid=0x510 waiting on condition [0x00007fee397a1000] >> 0x511 0 90.7 "http-apr-8443-exec-7" daemon prio=10 >> tid=0x0000000002188800 nid=0x511 waiting on condition [0x00007fee45db8000] >> 0x512 0 90.7 "http-apr-8443-exec-8" daemon prio=10 >> tid=0x00000000023b3800 nid=0x512 waiting on condition [0x00007fee44aa5000] >> 0x513 0 90.7 "http-apr-8443-exec-9" daemon prio=10 >> tid=0x0000000002164800 nid=0x513 waiting on condition [0x00007fee3dae4000] >> 0x514 0 90.7 "http-apr-8443-exec-10" daemon prio=10 >> tid=0x0000000002af6800 nid=0x514 waiting on condition [0x00007fee279f8000] >> 0x19a7 0 90.7 "http-apr-8080-exec-1" daemon prio=10 >> tid=0x0000000001ac2800 nid=0x19a7 waiting on condition [0x00007fee3747b000] >> 0x19a8 0 90.7 "http-apr-8080-exec-2" daemon prio=10 >> tid=0x000000000257d000 nid=0x19a8 waiting on condition [0x00007fee3767d000] >> 0x28b0 0 90.7 "http-apr-8080-exec-3" daemon prio=10 >> tid=0x000000000274c000 nid=0x28b0 waiting on condition [0x00007fee3f8f7000] >> 0x28b1 0 90.7 "http-apr-8080-exec-4" daemon prio=10 >> tid=0x00000000021c8800 nid=0x28b1 waiting on condition [0x00007fee3adb7000] >> 0x46e7 0 90.7 "Keep-Alive-Timer" daemon prio=10 >> tid=0x0000000002160800 nid=0x46e7 runnable [0x00007fee3d5df000] >> 0x29c5 0 90.7 "Keep-Alive-SocketCleaner" daemon prio=10 >> tid=0x000000000242b000 nid=0x29c5 in Object.wait() [0x00007fee1f877000] >> 0x6ef8 0 90.7 "http-apr-8080-exec-5" daemon prio=10 >> tid=0x0000000002a56000 nid=0x6ef8 waiting on condition [0x00007fee454af000] >> 0x6ef9 0 90.7 "http-apr-8080-exec-6" daemon prio=10 >> tid=0x00000000022de800 nid=0x6ef9 waiting on condition [0x00007fee45ab5000] >> 0x3d38 0 90.7 "http-apr-8080-exec-7" daemon prio=10 >> tid=0x0000000002233000 nid=0x3d38 waiting on condition [0x00007fee37980000] >> 0x3d39 0 90.7 "http-apr-8080-exec-8" daemon prio=10 >> tid=0x00000000021f7800 nid=0x3d39 waiting on condition [0x00007fee38087000] >> 0x64a3 0 90.7 "ajp-apr-8009-exec-36" daemon prio=10 >> tid=0x00000000022dd000 nid=0x64a3 waiting on condition [0x00007fee35158000] >> 0x64a6 0 90.7 "ajp-apr-8009-exec-39" daemon prio=10 >> tid=0x0000000002add800 nid=0x64a6 runnable [0x00007fee3bdc7000] >> 0x64a7 0 90.7 "ajp-apr-8009-exec-40" daemon prio=10 >> tid=0x00000000029c0800 nid=0x64a7 waiting on condition [0x00007fee24fce000] >> 0x64ab 0 90.7 "ajp-apr-8009-exec-44" daemon prio=10 >> tid=0x0000000002924000 nid=0x64ab runnable [0x00007fee20f8d000] >> 0x64ac 0 90.7 "ajp-apr-8009-exec-45" daemon prio=10 >> tid=0x0000000002555800 nid=0x64ac waiting on condition [0x00007fee256d5000] >> 0x64af 0 90.7 "ajp-apr-8009-exec-48" daemon prio=10 >> tid=0x0000000001f8a800 nid=0x64af waiting on condition [0x00007fee24ccb000] >> 0x64b0 0 90.7 "ajp-apr-8009-exec-49" daemon prio=10 >> tid=0x000000000257b000 nid=0x64b0 runnable [0x00007fee36d73000] >> 0x64b1 0 90.7 "ajp-apr-8009-exec-50" daemon prio=10 >> tid=0x0000000001bb3000 nid=0x64b1 runnable [0x00007fee36b71000] >> 0x64b2 0 90.7 "ajp-apr-8009-exec-51" daemon prio=10 >> tid=0x000000000224a000 nid=0x64b2 runnable [0x00007fee270ee000] >> 0x64b3 0 90.7 "ajp-apr-8009-exec-52" daemon prio=10 >> tid=0x000000000216e800 nid=0x64b3 runnable [0x00007fee20c8a000] >> 0xf1b 0 90.7 "http-apr-8080-exec-9" daemon prio=10 >> tid=0x0000000001c9e800 nid=0xf1b waiting on condition [0x00007fee3929c000] >> 0xf1c 0 90.7 "http-apr-8080-exec-10" daemon prio=10 >> tid=0x0000000001e4c000 nid=0xf1c waiting on condition [0x00007fee3d8e2000] >> 0x3eda 0 90.7 "ajp-apr-8009-exec-55" daemon prio=10 >> tid=0x0000000002155000 nid=0x3eda runnable [0x00007fee27cfb000] >> 0x3edb 0 90.7 "ajp-apr-8009-exec-56" daemon prio=10 >> tid=0x000000000232f800 nid=0x3edb runnable [0x00007fee27bfa000] >> 0x51c7 0 90.7 "ajp-apr-8009-exec-57" daemon prio=10 >> tid=0x000000000211c800 nid=0x51c7 waiting on condition [0x00007fee37c83000] >> >> >> >> On 9/27/12 12:17 PM, Shanti Suresh wrote: >> >> Hi Kirill, >> >> Like Mark, Bill and Jeff said, those threads are normal request-processing >> threads. I have included a script that might help with isolating high CPU >> issues with Tomcat. >> >> Also, I think it might be helpful to see how the Java heap is performing as >> well. >> Please bring up Jconsole and let it run over the week. Inspect the graphs >> for Memory, CPU and threads. Since you say that high CPU occurs >> intermittently several times during the week and clears itself, I wonder if >> it is somehow related with the garbage collection options you are using for >> the server. Or it may be a code-related problem. >> >> Things to look at may include: >> >> (1) Are high CPU times related to Java heap reductions happening at the >> same time? ==> GC possibly needs tuning >> (2) Are high CPU times related to increase in thread usage? ==> possible >> livelock in looping code? >> (3) how many network connections come into the Tomcat server during >> high-CPU times? Possible overload-related? >> >> Here is the script. I made a couple of small changes, for e.g., changing >> the username. But didn't test it after the change. During high-CPU times, >> invoke the script a few times, say 30 seconds apart. And then compare the >> thread-dumps. I like to use TDA for thread-dump analysis of Tomcat >> thread-dumps. >> >> Mark, et al, please feel free to help me refine this script. I would like >> to have a script to catch STUCK threads too :-) Let me know if anyone has >> a script already. Thanks. >> >> --------------high_cpu_diagnostics.pl:----- >> #!/usr/bin/perl >> # >> >> use Cwd; >> >> # Make a dated directory for capturing current diagnostics >> my ($sec,$min,$hour,$mday,$mon,$year, >> $wday,$yday,$isdst) = localtime time; >> $year += 1900; >> $mon += 1; >> my $pwd = cwd(); >> my $preview_diag_dir = "/tmp/Preview_Diag.$year-$mon-$mday-$hour:$min:$sec"; >> print "$preview_diag_dir\n"; >> mkdir $preview_diag_dir, 0755; >> chdir($preview_diag_dir) or die "Can't chdir into $preview_diag_dir $!\n"; >> >> # Capture Preview thread dump >> my $process_pattern = "preview"; >> my $preview_pid = `/usr/bin/pgrep -f $process_pattern`; >> my $login = getpwuid($<) ; >> if (kill 0, $preview_pid){ >> #Possible to send a signal to the Preview Tomcat - either "webinf" >> or "root" >> my $count = kill 3, $preview_pid; >> }else { >> # Not possible to send a signal to the VCM - use "sudo" >> system ("/usr/bin/sudo /bin/kill -3 $preview_pid"); >> } >> >> # Capture Preview thread dump >> system ("/usr/bin/jmap >> -dump:format=b,file=$preview_diag_dir/preview_heapdump.hprof $preview_pid"); >> >> # Gather the top threads; keep around for reference on what other threads >> are running >> @top_cmd = ("/usr/bin/top", "-H", "-n1", "-b"); >> @sort_cmd = ("/bin/sort", "-r", "-n", "-k", "9,9"); >> @sed_cmd = ("/bin/sed", "-n", "'8,$p'"); >> system("@top_cmd 1> top_all_threads.log"); >> >> # Get your tomcat user's threads, i.e. threads of user, "webinf" >> system('/usr/bin/tail -n+6 top_all_threads.log | /bin/sort -r -n -k "9,9" | >> /bin/grep webinf top_all_threads.log 1> top_user_webinf_threads.log'); >> >> # Get the thread dump >> my @output=`/usr/bin/jstack -l ${preview_pid}`; >> open (my $file, '>', 'preview_threaddump.txt') or die "Could not open file: >> $!"; >> print $file @output; >> close $file; >> >> open LOG, "top_user_webinf_threads.log" or die $!; >> open (STDOUT, "| tee -ai top_cpu_preview_threads.log"); >> print "PID\tCPU\tMem\tJStack Info\n"; >> while ($l = <LOG>) { >> chop $l; >> $pid = $l; >> $pid =~ s/webinf.*//g; >> $pid =~ s/ *//g; >> ## Hex PID is available in the Sun HotSpot Stack Trace */ >> $hex_pid = sprintf("%#x", $pid); >> @values = split(/\s+/, $l); >> $pct = $values[8]; >> $mem = $values[9]; >> # Debugger breakpoint: >> $DB::single = 1; >> >> # Find the Java thread that corresponds to the thread-id from the TOP output >> for my $j (@output) { >> chop $j; >> ($j =~ m/nid=$hex_pid/) && print $hex_pid . "\t" . $pct . "\t" . >> $mem . "\t" . $j . "\n"; >> } >> } >> >> close (STDOUT); >> >> close LOG; >> >> >> ------------------end of script -------------------------- >> >> Thanks. >> >> -Shanti >> >> >> On Thu, Sep 27, 2012 at 2:11 PM, Bill Miller >> <millebi.subscripti...@gmail.com> wrote: >> >> >> I agree; we have reproducible instances where PermGen is not set to our >> requirements on the Tomcat startup parameters and it will cause a "lockup" >> every time. Do some JMX monitoring and you may discover a memory spike >> that's killing Tomcat. >> >> Bill >> -----Original Message----- >> From: Jeff MAURY [mailto:jeffma...@gmail.com <jeffma...@gmail.com>] >> Sent: September-27-2012 2:01 PM >> To: Tomcat Users List >> Subject: Re: high CPU usage on tomcat 7 >> >> This is probably due to out of memory, I have the same problem on my ubuntu >> ci machine Did you monitor your tomcat with jmx ? >> >> Jeff >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: users-h...@tomcat.apache.org >> >> >> >> -- >> *Kirill Kireyev, PhD* >> Founder/CTO instaGrok.com <http://www.instagrok.com> >> kir...@instagrok.com >> Twitter: @instaGrok <http://twitter.com/InstaGrok> >> FB: facebook.com/instagrok <http://www.facebook.com/instagrok> >> <http://www.instagrok.com> >> > >