Thanks for the help.
we have 2 drives using basic configurations, commitlog on one drive and
data on another.
and Yes the CL for writes is 3, however, the CL for reads is 1.
You are saying I am doing 36000 inserts per second, when I am inserting
600 rows, I thought that every _row_ goes into one Node, so the work is
done for a _row _not a _column_, so my assumption is NOT true, the work
is done on a column level? so if I reduce the number of columns I will
get a "substantial" improvement in performance?
Also, what do you mean by "distributing the client load across the
cluster", I am doing the writing on Node1, the reading on Node2, and
the maintenance on Node3 (disabled the maintenance for now).
Do you think its better if I do writes on all 3 Nodes and reads on all 3
Nodes as well?
Thanks,
Alaa
On 9/26/2010 3:41 PM, Peter Schuller wrote:
It is odd that you are able to do 36000/sec _at all_ unless you are
using CL.ZERO, which would quickly lead to OOM.
The problem with the hypothesis as far as I can tell is that the
hotspot error log's heap information does not indicate that he's close
to maxing out his heap. And I don't believe the JVM ever goes for OOM
for GC efficiency reasons without the heap actually having reached
it's max size first.
I don't disagree with what you said in general, it just seems to me
that something else is going on here than just plain memory churn
based on both the seeming lack of a filled heap and the hotspot log's
claim that the culprit was a stack overflow rather than heap overflow.
One thing to try may be to run without concurrent GC (on the
hypothesis that there is some corruption issue going on). The problem
is that even if that fixes the problem it proves very little about the
root cause and is not necessarily useful in production anyway
(depending on heap size and latency requirements).
Another thing is to try simply increasing the stack size, but again if
this happens to work it's hiding the real problem rather than being a
real fix (on the premise that there is no legitimate significant stack
depth).
I'm not sure what the best course of action is here short of going
down the path of trying to investigate the problem at the JVM level.
I'm hoping someone will come along and point to a simple explanation
that we're missing :)
--
Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 700
San Jose, CA 95110 USA
Tel: 408-283-5639 (or 408-280-7900 x5639)
fax: 408-938-6479
email: alaa.zuba...@pdf.com