Hello, I've searched a lot about this issue on the web, but did not find a solution. I have the following setup:
Debian GNU/Linux 3.1 with official Tomcat5 package (5.0.30), java-package-built Sun JVM: # java -version java version "1.5.0_05" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_05-b05) Java HotSpot(TM) Server VM (build 1.5.0_05-b05, mixed mode) Our main application uses Hibernate to persist objects and C3P0 to control the connection pool. Here's the configuration file: hibernate.connection.driver_class com.p6spy.engine.spy.P6SpyDriver hibernate.connection.url jdbc:oracle:thin:@dbhost:1521:sid hibernate.connection.username user hibernate.connection.password pass hibernate.transaction.factory_class = org.hibernate.transaction.JDBCTransactionFactory hibernate.use_outer_join true hibernate.show_sql false hibernate.c3p0.minPoolSize = 10 hibernate.c3p0.maxPoolSize = 200; hibernate.c3p0.min_size=10 hibernate.c3p0.max_size=200 hibernate.c3p0.timeout=5000 hibernate.c3p0.max_statements=0 hibernate.cglib.use_reflection_optimizer=true hibernate.dialect org.hibernate.dialect.OracleDialect I use the P6SpyDriver to be able to log the SQL queries that go through to the Oracle database, but I have the same problem with the OracleDriver as well. So here's the problem: After some time running one java process begins using 100% CPU; strace shows this: --- SIGSEGV (Segmentation fault) @ 0 (0) --- rt_sigreturn(0x55d32bda) = -442421359 futex(0x805c770, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x81ff918, FUTEX_REQUEUE, 1, 2147483647, 0x805c770) = 1 futex(0x805c770, FUTEX_WAKE, 1) = 1 futex(0x805c7f0, FUTEX_WAIT, 2, NULL) = 0 futex(0x805c7f0, FUTEX_WAKE, 1) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- rt_sigreturn(0x500ddbda) = -442421359 futex(0x805c770, FUTEX_WAIT, 2, NULL) = 0 futex(0x81ff918, FUTEX_REQUEUE, 1, 2147483647, 0x805c770) = 1 futex(0x805c770, FUTEX_WAKE, 1) = 1 futex(0x805c7f0, FUTEX_WAIT, 2, NULL) = 0 futex(0x805c7f0, FUTEX_WAKE, 1) = 0 This may stay like that for the whole day, tomcat being responsive, or other processes may end up using 100% CPU as well and presenting the same behavior while on strace. Eventually, tomcat may become unresponsive, but I could not estabilish a pattern. I found similar SIGSEGV/futex behavior on a non-100%-using process, too: send(23, "\2c\0\0\6\0\0\0\0\0\21i\0\1\1\1\1\1\3^\0\2\200!\0\1\2\2\"\1\1\r\0\0\0\0\4\177\377\377\377\0\0\0\0\0\0\0\0\0select papelpesso0_.ID_ENTIDADE_DIRIGENTE as ID1_, papelpesso0_.DT_INICIO_MANDATO as DT2_29_, papelpesso0_.DT_FIM_MANDATO as DT3_29_, papelpesso0_.DT_NOMEACAO_DIRIGENTE as DT4_29_, papelpesso0_.NU_REMESSA_SIAFI as NU5_29_, papelpesso0_.DT_ENVIO_CREDOR_SIAFI as DT6_29_, papelpesso0_.DT_RETORNO_CREDOR_SIAFI as DT7_29_, papelpesso0_.ST_RESPONSAVEL as ST8_29_, papelpesso0_.CO_ENTIDADE as CO9_29_, papelpesso0_.CO_DIRIGENTE as CO10_29_, papelpesso0_.CO_CARGO as CO11_29_ from RL_ENTIDADE_DIRIGENTE papelpesso0_ where papelpesso0_.CO_DIRIGENTE=28071\1\1\0\0\0\0\0\0\1\1\0\0\0\0\0", 611, 0) = 611 recv(23, "\1\217\0\0\6\0\0\0", 8, 0) = 8 recv(23, "\0\0\20\31\0279 \273t6X\203\0\0xj\4\22\27\0220\4\31\0\0\0\0\0\0\1\201\1\v7\2\0\t\0\1\26\0\0\0\0\0\0\0\0\4\1\4\4ID1_\0\0\f\0\0\0\1\1\0\0\0\0\0\0\0\1\7\1\7\7DT2_29_\0\0\f\0\0\0\1\1\0\0\0\0\0\0\0\1\7\1\7\7DT3_29_\0\0\f\0\0\0\1\1\0\0\0\0\0\0\0\1\7\1\7\7DT4_29_\0\0\1\200\0\0\1\5\0\0\0\0\1\37\1\1\5\1\7\1\7\7NU5_29_\0\0\f\0\0\0\1\1\0\0\0\0\0\0\0\1\7\1\7\7DT6_29_\0\0\f\0\0\0\1\1\0\0\0\0\0\0\0\1\7\1\7\7DT7_29_\0\0`\200\0\0\1\1\0\0\0\0\1\37\1\1\1\1\7\1\7\7ST8_29_\0\0\2\0\f\0\1\26\0\0\0\0\0\0\0\0\7\1\7\7CO9_29_\0\0\2\0\n\0\1\26\0\0\0\0\0\0\0\0\10\1\10\10CO10_29_\0\0\2\0\t\0\1\26\0\0\0\0\0\0\0\0\10\1\10\10CO11_29_\0\0\1\7\7xj\4\23\2!\17\10\1\5\3+\367Y\0\1\1\0\0\0\0\4\0\0\0\0\1\1\2\1\332\3\0\0\0\0\0\2\177|\1\1\0\2\317/\1\5\0\0\0\0\1\1", 391, 0) = 391 send(23, "\0\21\0\0\6\0\0\0\0\0\3\5\0\1\1\1\n", 17, 0) = 17 recv(23, "\0F\0\0\6\0\0\0", 8, 0) = 8 recv(23, "\0\0\4\0\2\5{\0\0\1\1\0\3\0\1 \0\0\0\2\177|\1\1\0\2\317/\1\5\0\0\0\0\1\1\31ORA-01403: no data found\n", 62, 0) = 62 gettimeofday({1145421443, 130050}, NULL) = 0 gettimeofday({1145421443, 130213}, NULL) = 0 write(14, "1145421443130|17|37|statement|select papelpesso0_.ID_ENTIDADE_DIRIGENTE as ID1_, papelpesso0_.DT_INICIO_MANDATO as DT2_29_, papelpesso0_.DT_FIM_MANDATO as DT3_29_, papelpesso0_.DT_NOMEACAO_DIRIGENTE as DT4_29_, papelpesso0_.NU_REMESSA_SIAFI as NU5_29_, papelpesso0_.DT_ENVIO_CREDOR_SIAFI as DT6_29_, papelpesso0_.DT_RETORNO_CREDOR_SIAFI as DT7_29_, papelpesso0_.ST_RESPONSAVEL as ST8_29_, papelpesso0_.CO_ENTIDADE as CO9_29_, papelpesso0_.CO_DIRIGENTE as CO10_29_, papelpesso0_.CO_CARGO as CO11_29_ from RL_ENTIDADE_DIRIGENTE papelpesso0_ where papelpesso0_.CO_DIRIGENTE=28071|select papelpesso0_.ID_ENTIDADE_DIRIGENTE as ID1_, papelpesso0_.DT_INICIO_MANDATO as DT2_29_, papelpesso0_.DT_FIM_MANDATO as DT3_29_, papelpesso0_.DT_NOMEACAO_DIRIGENTE as DT4_29_, papelpesso0_.NU_REMESSA_SIAFI as NU5_29_, papelpesso0_.DT_ENVIO_CREDOR_SIAFI as DT6_29_, papelpesso0_.DT_RETORNO_CREDOR_SIAFI as DT7_29_, papelpesso0_.ST_RESPONSAVEL as ST8_29_, papelpesso0_.CO_ENTIDADE as CO9_29_, papelpesso0_.CO_DIRIGENTE as CO10_29_, papelpesso0_"..., 1123) = 1123 write(14, "\n", 1) = 1 gettimeofday({1145421443, 131610}, NULL) = 0 gettimeofday({1145421443, 131692}, NULL) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- rt_sigreturn(0x443e228c) = 0 futex(0x805c770, FUTEX_WAIT, 2, NULL) = 0 futex(0x8ab5810, FUTEX_REQUEUE, 1, 2147483647, 0x805c770) = 1 futex(0x805c770, FUTEX_WAKE, 1) = 1 futex(0x805c7f0, FUTEX_WAIT, 2, NULL) = 0 futex(0x805c7f0, FUTEX_WAKE, 1) = 0 At this moment Tomcat was unresponsive. Moments before there were some exceptions: org.hibernate.HibernateException: Found shared references to a collection at org.hibernate.engine.Collections.processReachableCollection ( Collections.java:130) [...] ERROR 19 Abr 2006 01:25:14,744 suasnob.component.HibernateSession -> error during commit org.hibernate.HibernateException: Flush during cascade is dangerous [...] WARN 19 Abr 2006 01:25:14,775 hibernate.util.JDBCExceptionReporter -> SQL Error: 0, SQLState: null WARN 19 Abr 2006 01:25:14,775 hibernate.util.JDBCExceptionReporter -> SQL Error: 0, SQLState: null ERROR 19 Abr 2006 01:25:14,776 hibernate.util.JDBCExceptionReporter -> You can't operate on a closed Connection!!! ERROR 19 Abr 2006 01:25:14,776 hibernate.util.JDBCExceptionReporter -> You can't operate on a closed Connection!!! WARN 19 Abr 2006 01:25:14,776 hibernate.util.JDBCExceptionReporter -> SQL Error: 0, SQLState: null WARN 19 Abr 2006 01:25:14,776 hibernate.util.JDBCExceptionReporter -> SQL Error: 0, SQLState: null ERROR 19 Abr 2006 01:25:14,776 hibernate.util.JDBCExceptionReporter -> You can't operate on a closed Connection!!! ERROR 19 Abr 2006 01:25:14,776 hibernate.util.JDBCExceptionReporter -> You can't operate on a closed Connection!!! org.hibernate.exception.GenericJDBCException: could not execute query [...] Caused by: java.sql.SQLException : You can't operate on a closed Connection!!! at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:104) at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:65) at com.mchange.v2.c3p0.impl.NewProxyConnection.prepareStatement ( NewProxyConnection.java:199) [...] Caused by: java.lang.NullPointerException at com.mchange.v2.c3p0.impl.NewProxyConnection.prepareStatement( NewProxyConnection.java:174) ... 108 more [...] ERROR 19 Abr 2006 01:25:20,408 suasnob.interceptor.HibernateInterceptor -> HibernateException in execute() org.hibernate.HibernateException: Flush during cascade is dangerous [...] Then Tomcat became unresponsive. Moments after this was spit: WARN 19 Abr 2006 01:46:23,018 hibernate.jdbc.ConnectionManager -> unclosed connection, forgot to call close() on your session? WARN 19 Abr 2006 01:46:23,018 hibernate.jdbc.ConnectionManager -> unclosed connection, forgot to call close() on your session? And then I restarted tomcat to get it back. I believe these might be related. We have lots of exceptions related to database connections, hibernate and c3p0 on the logs, such as: exception clearing maxRows/queryTimeout, You can't operate on a closed Connection!!!, You can't operate on a closed ResultSet!!!, SQL Error: 17010, SQLState: null. Also, it might be important to notice that we have some of these from time to time: WARN 19 Abr 2006 06:48:09,577 v2.async.ThreadPoolAsynchronousRunner -> [EMAIL PROTECTED] -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks! WARN 19 Abr 2006 06:48:09,578 v2.async.ThreadPoolAsynchronousRunner -> [EMAIL PROTECTED] -- APPARENT DEADLOCK!!! Complete Status: [num_managed_threads: 3, num_active: 3; activeTasks: [EMAIL PROTECTED] ( com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0), [EMAIL PROTECTED] ( com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1), [EMAIL PROTECTED] ( com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2 ); pendingTasks: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] , [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] ] WARN 19 Abr 2006 06:49:23,468 v2.async.ThreadPoolAsynchronousRunner -> Task [EMAIL PROTECTED] (in deadlocked PoolThread) failed to complete in maximum time 60000ms. Trying interrupt(). WARN 19 Abr 2006 06:49:23,468 v2.async.ThreadPoolAsynchronousRunner -> Task [EMAIL PROTECTED] (in deadlocked PoolThread) failed to complete in maximum time 60000ms. Trying interrupt(). WARN 19 Abr 2006 06:49:23,468 v2.async.ThreadPoolAsynchronousRunner -> Task [EMAIL PROTECTED] (in deadlocked PoolThread) failed to complete in maximum time 60000ms. Trying interrupt(). Being a beginner on Tomcat, I'm unable to easily identify what may be faulty here: if our application, our tomcat configuration, mainly hibernate's and c3p0's, or if it's something deeper on Java or the OS... Any hints on where to investigate, and how to solve this symptom? Thanks, -- Gustavo Noronha Silva Coordenação de Segurança e Sustentação Ministério do Desenvolvimento Social - Brasil