OutOfMemoryError in ReadStage

Ian Rose Sun, 22 Mar 2015 19:49:58 -0700

Hi all -

I had a nasty streak of OOMs earlier today (several on one node, and a
single OOM on one other node).  I've downloaded a few of the hprof files
for local analysis.  In each case, there is a single ReadStage thread with
a huge (> 7.5GB) org.apache.cassandra.db.ArrayBackedSortedColumns
instance.  I'm trying to understand exactly what this means.


1) Does a ReadStage thread only process one query at a time?  If so, then a
reasonable conclusion (I think) would be I had a single query that produced
a ton of results.  If not (if ReadStage threads can work on multiple
queries concurrently) then this volume of data might have been produced by
a combination of queries.

2) My driver (gocql) does not appear to enable paging by default.  Am I
correct in assuming that this should "solve the problem" (more precisely:
avoid OOMs due to me fetching a ton of rows, assuming that is the problem
and not that I am fetching a small number of very large rows)?

3) Is there any way for me (either from the system.log or from the hprof
dumps) to tell what query was currently executing when the process OOMed?
If I dig down in the object hierarchy, I see: Thread -> MessageDeliveryTask
-> message -> payload, which has the right ksName and cfName.  But the
"key" property is a byte array - is there an easy way for me to map this
onto my column key (which has multiple CQL columns in a composite key).

4) Alternatively, is it possible for me to see how many rows had been read
for that query so far?  That way I can at least validate that the problem
was "too many rows" and not "rows are too big".

Many thanks!
- Ian

OutOfMemoryError in ReadStage

Reply via email to