Java process segfaulting and using 100% CPU

Gustavo Noronha Wed, 19 Apr 2006 07:11:11 -0700

Hello,

I've searched a lot about this issue on the web, but did not find a
solution. I have the following setup:


Debian GNU/Linux 3.1 with official Tomcat5 package (5.0.30),
java-package-built Sun JVM:

# java -version
java version "1.5.0_05"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_05-b05)
Java HotSpot(TM) Server VM (build 1.5.0_05-b05, mixed mode)

Our main application uses Hibernate to persist objects and C3P0 to control
the connection pool. Here's the configuration file:

hibernate.connection.driver_class com.p6spy.engine.spy.P6SpyDriver
hibernate.connection.url jdbc:oracle:thin:@dbhost:1521:sid
hibernate.connection.username user
hibernate.connection.password pass
hibernate.transaction.factory_class =
org.hibernate.transaction.JDBCTransactionFactory
hibernate.use_outer_join true
hibernate.show_sql false
hibernate.c3p0.minPoolSize = 10
hibernate.c3p0.maxPoolSize = 200;
hibernate.c3p0.min_size=10
hibernate.c3p0.max_size=200
hibernate.c3p0.timeout=5000
hibernate.c3p0.max_statements=0
hibernate.cglib.use_reflection_optimizer=true
hibernate.dialect org.hibernate.dialect.OracleDialect

I use the P6SpyDriver to be able to log the SQL queries that go through to
the Oracle database, but I have the same problem with the OracleDriver as
well. So here's the problem:

After some time running one java process begins using 100% CPU; strace shows
this:

--- SIGSEGV (Segmentation fault) @ 0 (0) ---
rt_sigreturn(0x55d32bda)                = -442421359
futex(0x805c770, FUTEX_WAIT, 2, NULL)   = -1 EAGAIN (Resource temporarily
unavailable)
futex(0x81ff918, FUTEX_REQUEUE, 1, 2147483647, 0x805c770) = 1
futex(0x805c770, FUTEX_WAKE, 1)         = 1
futex(0x805c7f0, FUTEX_WAIT, 2, NULL)   = 0
futex(0x805c7f0, FUTEX_WAKE, 1)         = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
rt_sigreturn(0x500ddbda)                = -442421359
futex(0x805c770, FUTEX_WAIT, 2, NULL)   = 0
futex(0x81ff918, FUTEX_REQUEUE, 1, 2147483647, 0x805c770) = 1
futex(0x805c770, FUTEX_WAKE, 1)         = 1
futex(0x805c7f0, FUTEX_WAIT, 2, NULL)   = 0
futex(0x805c7f0, FUTEX_WAKE, 1)         = 0

This may stay like that for the whole day, tomcat being responsive, or other
processes may end up using 100% CPU as well and presenting the same behavior
while on strace. Eventually, tomcat may become unresponsive, but I could not
estabilish a pattern. I found similar SIGSEGV/futex behavior on a
non-100%-using process, too:

send(23,
"\2c\0\0\6\0\0\0\0\0\21i\0\1\1\1\1\1\3^\0\2\200!\0\1\2\2\"\1\1\r\0\0\0\0\4\177\377\377\377\0\0\0\0\0\0\0\0\0select
papelpesso0_.ID_ENTIDADE_DIRIGENTE as ID1_, papelpesso0_.DT_INICIO_MANDATO
as DT2_29_, papelpesso0_.DT_FIM_MANDATO as DT3_29_,
papelpesso0_.DT_NOMEACAO_DIRIGENTE as DT4_29_, papelpesso0_.NU_REMESSA_SIAFI
as NU5_29_, papelpesso0_.DT_ENVIO_CREDOR_SIAFI as DT6_29_,
papelpesso0_.DT_RETORNO_CREDOR_SIAFI as DT7_29_, papelpesso0_.ST_RESPONSAVEL
as ST8_29_, papelpesso0_.CO_ENTIDADE as CO9_29_, papelpesso0_.CO_DIRIGENTE
as CO10_29_, papelpesso0_.CO_CARGO as CO11_29_ from RL_ENTIDADE_DIRIGENTE
papelpesso0_ where
papelpesso0_.CO_DIRIGENTE=28071\1\1\0\0\0\0\0\0\1\1\0\0\0\0\0", 611, 0) =
611
recv(23, "\1\217\0\0\6\0\0\0", 8, 0)    = 8
recv(23, "\0\0\20\31\0279
\273t6X\203\0\0xj\4\22\27\0220\4\31\0\0\0\0\0\0\1\201\1\v7\2\0\t\0\1\26\0\0\0\0\0\0\0\0\4\1\4\4ID1_\0\0\f\0\0\0\1\1\0\0\0\0\0\0\0\1\7\1\7\7DT2_29_\0\0\f\0\0\0\1\1\0\0\0\0\0\0\0\1\7\1\7\7DT3_29_\0\0\f\0\0\0\1\1\0\0\0\0\0\0\0\1\7\1\7\7DT4_29_\0\0\1\200\0\0\1\5\0\0\0\0\1\37\1\1\5\1\7\1\7\7NU5_29_\0\0\f\0\0\0\1\1\0\0\0\0\0\0\0\1\7\1\7\7DT6_29_\0\0\f\0\0\0\1\1\0\0\0\0\0\0\0\1\7\1\7\7DT7_29_\0\0`\200\0\0\1\1\0\0\0\0\1\37\1\1\1\1\7\1\7\7ST8_29_\0\0\2\0\f\0\1\26\0\0\0\0\0\0\0\0\7\1\7\7CO9_29_\0\0\2\0\n\0\1\26\0\0\0\0\0\0\0\0\10\1\10\10CO10_29_\0\0\2\0\t\0\1\26\0\0\0\0\0\0\0\0\10\1\10\10CO11_29_\0\0\1\7\7xj\4\23\2!\17\10\1\5\3+\367Y\0\1\1\0\0\0\0\4\0\0\0\0\1\1\2\1\332\3\0\0\0\0\0\2\177|\1\1\0\2\317/\1\5\0\0\0\0\1\1",
391, 0) = 391
send(23, "\0\21\0\0\6\0\0\0\0\0\3\5\0\1\1\1\n", 17, 0) = 17
recv(23, "\0F\0\0\6\0\0\0", 8, 0)       = 8
recv(23, "\0\0\4\0\2\5{\0\0\1\1\0\3\0\1
\0\0\0\2\177|\1\1\0\2\317/\1\5\0\0\0\0\1\1\31ORA-01403: no data found\n",
62, 0) = 62
gettimeofday({1145421443, 130050}, NULL) = 0
gettimeofday({1145421443, 130213}, NULL) = 0
write(14, "1145421443130|17|37|statement|select
papelpesso0_.ID_ENTIDADE_DIRIGENTE as ID1_, papelpesso0_.DT_INICIO_MANDATO
as DT2_29_, papelpesso0_.DT_FIM_MANDATO as DT3_29_,
papelpesso0_.DT_NOMEACAO_DIRIGENTE as DT4_29_, papelpesso0_.NU_REMESSA_SIAFI
as NU5_29_, papelpesso0_.DT_ENVIO_CREDOR_SIAFI as DT6_29_,
papelpesso0_.DT_RETORNO_CREDOR_SIAFI as DT7_29_, papelpesso0_.ST_RESPONSAVEL
as ST8_29_,
papelpesso0_.CO_ENTIDADE as CO9_29_, papelpesso0_.CO_DIRIGENTE as CO10_29_,
papelpesso0_.CO_CARGO as CO11_29_ from RL_ENTIDADE_DIRIGENTE papelpesso0_
where papelpesso0_.CO_DIRIGENTE=28071|select
papelpesso0_.ID_ENTIDADE_DIRIGENTE as ID1_, papelpesso0_.DT_INICIO_MANDATO
as DT2_29_, papelpesso0_.DT_FIM_MANDATO as DT3_29_,
papelpesso0_.DT_NOMEACAO_DIRIGENTE as DT4_29_, papelpesso0_.NU_REMESSA_SIAFI
as NU5_29_, papelpesso0_.DT_ENVIO_CREDOR_SIAFI as DT6_29_,
papelpesso0_.DT_RETORNO_CREDOR_SIAFI as DT7_29_, papelpesso0_.ST_RESPONSAVEL
as ST8_29_, papelpesso0_.CO_ENTIDADE as CO9_29_, papelpesso0_.CO_DIRIGENTE
as CO10_29_, papelpesso0_"..., 1123) = 1123
write(14, "\n", 1)                      = 1
gettimeofday({1145421443, 131610}, NULL) = 0
gettimeofday({1145421443, 131692}, NULL) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
rt_sigreturn(0x443e228c)                = 0
futex(0x805c770, FUTEX_WAIT, 2, NULL)   = 0
futex(0x8ab5810, FUTEX_REQUEUE, 1, 2147483647, 0x805c770) = 1
futex(0x805c770, FUTEX_WAKE, 1)         = 1
futex(0x805c7f0, FUTEX_WAIT, 2, NULL)   = 0
futex(0x805c7f0, FUTEX_WAKE, 1)         = 0

At this moment Tomcat was unresponsive. Moments before there were some
exceptions:

org.hibernate.HibernateException: Found shared references to a collection
        at org.hibernate.engine.Collections.processReachableCollection (
Collections.java:130)
[...]
ERROR 19 Abr 2006 01:25:14,744 suasnob.component.HibernateSession  -> error
during commit
org.hibernate.HibernateException: Flush during cascade is dangerous
[...]
WARN  19 Abr 2006 01:25:14,775 hibernate.util.JDBCExceptionReporter  -> SQL
Error: 0, SQLState: null
WARN  19 Abr 2006 01:25:14,775 hibernate.util.JDBCExceptionReporter  -> SQL
Error: 0, SQLState: null
ERROR 19 Abr 2006 01:25:14,776 hibernate.util.JDBCExceptionReporter   -> You
can't operate on a closed Connection!!!
ERROR 19 Abr 2006 01:25:14,776 hibernate.util.JDBCExceptionReporter  -> You
can't operate on a closed Connection!!!
WARN  19 Abr 2006 01:25:14,776 hibernate.util.JDBCExceptionReporter   -> SQL
Error: 0, SQLState: null
WARN  19 Abr 2006 01:25:14,776 hibernate.util.JDBCExceptionReporter  -> SQL
Error: 0, SQLState: null
ERROR 19 Abr 2006 01:25:14,776 hibernate.util.JDBCExceptionReporter  -> You
can't operate on a closed Connection!!!
ERROR 19 Abr 2006 01:25:14,776 hibernate.util.JDBCExceptionReporter  -> You
can't operate on a closed Connection!!!
org.hibernate.exception.GenericJDBCException: could not execute query
[...]
Caused by: java.sql.SQLException : You can't operate on a closed
Connection!!!
        at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:104)
        at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:65)
        at com.mchange.v2.c3p0.impl.NewProxyConnection.prepareStatement (
NewProxyConnection.java:199)
[...]
Caused by: java.lang.NullPointerException
        at com.mchange.v2.c3p0.impl.NewProxyConnection.prepareStatement(
NewProxyConnection.java:174)
        ... 108 more
[...]
ERROR 19 Abr 2006 01:25:20,408 suasnob.interceptor.HibernateInterceptor  ->
HibernateException in execute()
org.hibernate.HibernateException: Flush during cascade is dangerous
[...]

Then Tomcat became unresponsive. Moments after this was spit:

WARN  19 Abr 2006 01:46:23,018 hibernate.jdbc.ConnectionManager  -> unclosed
connection, forgot to call close() on your session?
WARN  19 Abr 2006 01:46:23,018 hibernate.jdbc.ConnectionManager  -> unclosed
connection, forgot to call close() on your session?

And then I restarted tomcat to get it back. I believe these might be
related. We have lots of exceptions related to database connections,
hibernate and c3p0 on the logs, such as:

exception clearing maxRows/queryTimeout, You can't operate on a closed
Connection!!!, You can't operate on a closed ResultSet!!!, SQL Error: 17010,
SQLState: null.

Also, it might be important to notice that we have some of these from time
to time:

WARN  19 Abr 2006 06:48:09,577 v2.async.ThreadPoolAsynchronousRunner  ->
[EMAIL PROTECTED] --
APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending
tasks!
WARN  19 Abr 2006 06:48:09,578 v2.async.ThreadPoolAsynchronousRunner  ->
[EMAIL PROTECTED] --
APPARENT DEADLOCK!!! Complete Status: [num_managed_threads: 3, num_active:
3; activeTasks:
[EMAIL PROTECTED] (
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0),
[EMAIL PROTECTED] (
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1),
[EMAIL PROTECTED] (
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2 );
pendingTasks: [EMAIL PROTECTED],
[EMAIL PROTECTED],
[EMAIL PROTECTED],
[EMAIL PROTECTED] ,
[EMAIL PROTECTED],
[EMAIL PROTECTED],
[EMAIL PROTECTED],
[EMAIL PROTECTED] ]
WARN  19 Abr 2006 06:49:23,468 v2.async.ThreadPoolAsynchronousRunner  ->
Task [EMAIL PROTECTED] (in deadlocked
PoolThread) failed to complete in maximum time 60000ms. Trying interrupt().
WARN  19 Abr 2006 06:49:23,468 v2.async.ThreadPoolAsynchronousRunner  ->
Task [EMAIL PROTECTED] (in deadlocked
PoolThread)
failed to complete in maximum time 60000ms. Trying interrupt().
WARN  19 Abr 2006 06:49:23,468 v2.async.ThreadPoolAsynchronousRunner  ->
Task [EMAIL PROTECTED] (in deadlocked
PoolThread) failed to complete in maximum time 60000ms. Trying interrupt().

Being a beginner on Tomcat, I'm unable to easily identify what may be faulty
here: if our application, our tomcat configuration, mainly hibernate's and
c3p0's, or if it's something deeper on Java or the OS... Any hints on where
to investigate, and how to solve this symptom?

Thanks,

--
Gustavo Noronha Silva
Coordenação de Segurança e Sustentação
Ministério do Desenvolvimento Social - Brasil

Java process segfaulting and using 100% CPU

Reply via email to