Thanks so much Jonathan! The stack trace does show the main thread was waiting on a condition to be fired.
----------------- lwp# 1 / thread# 1 -------------------- ffffffff7e8d270c lwp_park (0, 0, 0) ffffffff7e8cbee8 cond_wait_queue (ffffffff7ae95518, ffffffff7ae95500, 0, 0, 0, ffffffff7e9f9040) + 28 ffffffff7e8cc498 cond_wait (ffffffff7ae95518, ffffffff7ae95500, 0, ffffffff7e9f8f84, 0, 0) + 10 ffffffff7e8cc4d4 pthread_cond_wait (ffffffff7ae95518, ffffffff7ae95500, 0, 1000, ffffffff7e900200, 0) + 8 ffffffff7a82c9a0 ttc_recMutexLock (ffffffff7ae95500, 7, 1, 1, 1002b5bf0, 0) + 38 ffffffff7ad0fb78 SQLDisconnect (1002b5bf0, 12400, 173598, ffffffff7ae830f0, 1, ffffffff7ad6cef8) + 40 ffffffff7c5e8050 x10lofLogoffInternal (ffffffff7e9f7340, 1002909e8, 10028f9f8, 8, 1002b5bf0, 0) + 1f0 ffffffff7c5f1488 x10allReExecute (1002909e8, 2f79, 1005618b0, 1002a6820, 1002ada28, 10028f9f8) + 5c8 ffffffff7c5e5e34 x10odr (1002909e8, 4, 100292ff0, 0, 100293168, ffffffff7e9f7340) + 494 ffffffff7c31e0ec upirtrc (1002909e8, 4, 1001e6740, 6, 0, 1002564e8) + 90c ffffffff7c515910 kpurcsc (1002901e0, 4, ffffffff7fffd696, 100294988, 0, 0) + 70 ffffffff7c481d50 kpuexec (3000, 100292ff0, 1002af580, 100294988, 100290978, 10028faa0) + 2750 ffffffff7c329ad0 OCIStmtExecute (1002901e0, 1002af500, 1002902b8, 1, 0, 0) + 30 0000000100069ebc _ZN13COCIStatement13executeCommitEv (1002b9870, 1000b4138, 0, 36, 0, 1001e5d3a) + 78 000000010003df0c _ZN17DBDynFilterInsert4initEv (1001e5ca0, 2400, ffffffff7e9f72e8, ffffffff7e9f72e4, ffffffff7e9ec000, 26 ) + d0 000000010003e780 _ZN25DBConnPoolDynFilterInsert7executeEv (1001e2fd0, ffffffff7fffed00, 10003aa1c, 9, 0, 0) + 78 000000010003af30 main (3, ffffffff7ffff0e8, ffffffff7ffff108, 1001e2c60, 100000000, ffffffff7df00180) + 4d4 000000010003a6d4 _start (0, 0, 0, 0, 0, 0) + 7c -----Original Message----- From: Jonathan Adams [mailto:jonathan.ad...@oracle.com] Sent: Tuesday, December 14, 2010 7:30 PM To: Cathy Guo Cc: opensolaris-code@opensolaris.org Subject: Re: [osol-code] lwp_park and lwp_unpark On Tue, Dec 14, 2010 at 02:16:34PM -0500, Cathy Guo wrote: > Our application has 9 threads, 1 main thread and 8 monitoring threads. > The main thread does network transactions and the monitoring threads > re-create failed network connections periodically. We ran a test with > broken network connections to see monitoring threads try to re-create > network connections. In this test, we experienced delay in the main > thread. Truss shows the main thread called lwp_park and slept for > quite sometime before a monitoring thread called lwp_unpark to wake it > up. The delay in main thread sometime was more than 1 minute. There's > no mutex contention between these threads. What is the stack trace for the main thread while it is parked? > The machine I'm using has 4 cpus with 4 cores each. The application > only used less 0.1% CPU. > > I seems to me the machine has enough resources for all 9 threads to run. > I'm puzzled by why the main thread gets parked for so long. threads call lwp_park() when they are waiting for a mutex to be dropped or a condition variable to be fired. The stack trace while it is waiting is going to be the best clue as to what is going on. Cheers, - jonathan _______________________________________________ opensolaris-code mailing list opensolaris-code@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/opensolaris-code