David Rees wrote:
On Mon, Mar 31, 2008 at 12:49 PM, Rainer Jung <[EMAIL PROTECTED]> wrote:
First to make sure: counting objects in general only makes sense after a
full GC. Otherwise the heap dump will contain garbage too.
Yes, I made sure the objects I was looking at had a valid GC
reference. They really were getting stuck in the queue.
Just some basic info: the LinkObject objects can be either in a
FastQueue, or they are used in a FastAsyncSocketSender directly after
removing from the FastQueue and before actually sending.
<snip>
Thank you for the detailed description on how the Queues work with the cluster.
Why you had that many LinkObjects is not clear. You could first try to
check, if the LinkObjects actually belong to a Queue, or not (e.g. then
they are already in the Sender). Have a look at your log files, if there
are errors or unexpected cluster membership messages.
One problem I've intermittently had with clustering is that after a
Tomcat restart (we shut down one node and it immediately restarts,
generally within 30 seconds), they two nodes don't consistently sync
up. (The restarted node would not have the sessions from the other
node, but new sessions would get replicated over) I have to think that
this may be related to this issue.
I believe you have to wait at least 30seconds before you bring up the
other node.
especially, if you are using mcastDropTime="30000" (could be the
default?) then your nodes wont even realize this one is gone, and when
you bring it back up within 30seconds, to the other nodes its like
nothing ever changed.
As rainer mentioned, if you are just starting to use cluster, switch to
TC6 to avoid the migration you will have to make. TC6 also handles this
scenario regardless of what you set your droptime to
Filip
I checked the logs and didn't see any issues in the Tomcat logs with
members dropping from the cluster until the JVM got close to running
out of memory and performing a lot of full GCs - when examing the
dump, the vast majority of space in the heap (600+MB out of 1GB) was
with byte arrays referenced by LinkObjects.
In general I would suggest to not use the waitForAck feature. That's not
a strict rule, but if you do async replication and use session
stickyness for load balancing, then you usually put a strong focus on
the replication not influencing your webapp negatively. Activating
waitForAck lets you realize more reliably, if there is a replication
problem, but it also increases the overhead. You mileage may vary.
So what would cause the FastQueue to accumulate ClusterData even when
the cluster is apparently running properly? Is there any failsafe
(besides setting a maximum queuesize) to allow old data to be purged?
I mean, 600k ClusterData objects is a lot!
-Dave
---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]