> It is odd that you are able to do 36000/sec _at all_ unless you are
> using CL.ZERO, which would quickly lead to OOM.

The problem with the hypothesis as far as I can tell is that the
hotspot error log's heap information does not indicate that he's close
to maxing out his heap. And I don't believe the JVM ever goes for OOM
for GC efficiency reasons without the heap actually having reached
it's max size first.

I don't disagree with what you said in general, it just seems to me
that something else is going on here than just plain memory churn
based on both the seeming lack of a filled heap and the hotspot log's
claim that the culprit was a stack overflow rather than heap overflow.

One thing to try may be to run without concurrent GC (on the
hypothesis that there is some corruption issue going on). The problem
is that even if that fixes the problem it proves very little about the
root cause and is not necessarily useful in production anyway
(depending on heap size and latency requirements).

Another thing is to try simply increasing the stack size, but again if
this happens to work it's hiding the real problem rather than being a
real fix (on the premise that there is no legitimate significant stack
depth).

I'm not sure what the best course of action is here short of going
down the path of trying to investigate the problem at the JVM level.

I'm hoping someone will come along and point to a simple explanation
that we're missing :)

-- 
/ Peter Schuller

Reply via email to