If there is an oom it will be in the logs.
On Aug 5, 2014 8:17 PM, "Clint Kelly" <clint.ke...@gmail.com> wrote:

> Hi everyone,
>
> For some integration tests, we start up a CassandraDaemon in a
> separate process (using the Java 7 ProcessBuilder API).  All of my
> integration tests run beautifully on my laptop, but one of them fails
> on our Jenkins cluster.
>
> The failing integration test does around 10k writes to different rows
> and then 10k reads.  After running some number of reads, the job dies
> with this error:
>
> com.datastax.driver.core.exceptions.NoHostAvailableException: All
> host(s) tried for query failed (tried: /127.0.0.10:58209
> (com.datastax.driver.core.exceptions.DriverException: Timeout during
> read))
>
> This error appears to have occurred because the Cassandra process has
> stopped.  The logs for the Cassandra process show some warnings during
> batch writes (the batches are too big), no activity for a few minutes
> (I assume this is because all of the read operations were proceeding
> smoothly), and then look like the following:
>
> INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,903
> ThriftServer.java (line 141) Stop listening to thrift clients
>  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,920 Server.java
> (line 182) Stop listening for CQL clients
>  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,930
> Gossiper.java (line 1279) Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:53,930
> MessagingService.java (line 683) Waiting for messaging service to
> quiesce
>  INFO [ACCEPT-/127.0.0.10] 2014-08-05 19:14:53,931
> MessagingService.java (line 923) MessagingService has terminated the
> accept() thread
>
> Does anyone have any ideas about how to debug this?  Looking around on
> google I found some threads suggesting that this could occur from an
> OOM error (
> http://stackoverflow.com/questions/23755040/cassandra-exits-with-no-errors
> ).
> Wouldn't such an error be logged, however?
>
> The test that fails is a test of our MapReduce Hadoop InputFormat and
> as such it does some pretty big queries across multiple rows (over a
> range of partitioning key tokens).  The default fetch size I believe
> is 5000 rows, and the values in the rows I am fetching are just simple
> strings, so I would not think the amount of data in a single read
> would be too big.
>
> FWIW I don't see any log messages about garbage collection for at
> least 3min before the process shuts down (and no GC messages after the
> test stops doing writes and starts doing reads).
>
> I'd greatly appreciate any help before my team kills me for breaking
> our Jenkins build so consistently!  :)
>
> Best regards,
> Clint
>

Reply via email to