Re: Application crash after Migrate to different ESX

Christopher Schultz Thu, 19 May 2011 14:16:31 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

הילה,

On 5/19/2011 4:06 PM, הילה wrote:
>> 1) You have tomcat 6.0.29 running on virtual machines running on win 2008
>> 2) You have a load balancer which calls a home made xslt transform, that
>> queries the database. If this fails 3 times in a row, the load balancer
>> treats the machine as dead

#2 here blows my mind: an LB runs an XSLT that connects to a DB? WTF?

> just want to add that sometimes it is dead, the keep alive get page 500 and
> stays like this (this is the behavior while disconnect and reconnect the
> network card when the server is out of the LB pool), and sometimes it's page
> 500, then ok state, then page 500 again, then ok - until I restart the
> tomcat and then it's settled (this is the behavior while disconnect and
> reconnect the network card / migrate to another esx , when the server is
> part of the LB pool).

Observing "success, fail, success, fail, success, fail" is usually what
I see when some of the connections in a connection pool are broken while
others are not. It's essentially random which connection you'll get from
the pool, so you may get a good one, then a bad one, then a good one, etc.

Depending on exactly how the connections were established, how they were
severed, and how much time elapsed during the interruption, some may be
able to reconnect using the underlying TCP/IP reconnect while others are
permanently broken. The only solution is to flush those broken
connections from the pool.

I was asking about which pool you were using because there is not a
single "JDBC connection pool". There are two provided by Tomcat and you
neither said that you were using one of them, nor said which one you
were using. Not everyone on the list keeps their notes from your past
posts on their desks just in case you ask a followup later on. It's very
helpful when starting a new thread to re-explain your setup and give as
much detail as possible. I've never seen you post a <Resource> element,
for instance.

>> The problem is clearly that the pool doesn't test the connections when it
>> lends them. You say that you are unwilling or unable to alter this by
>> configuring test on borrow and using dbcp. The alternative seems to catch
>> the exception in your custom code, and get it to re-initialize the database
>> connection pool. You will effectively just re-implement the connection
>> pool,
>> but if you won't use the existing one doesn't seem much else to suggest
>>
> I haven't understood the part in purple. can you explain?

This list sends messages in plain text. There is no purple. Could you
reply and indicate what you mean using words instead of colors?

> and for the sake of trying, I will configure test on borrow and see what
> happens.

I think that would be a good idea.

> even if the server is out of the LB pool, it still can connect to DB and the
> keep alive can still show OK (or not, if the server/DB is not functioning).
> the difference is surfing to keepalive.xml when it's in the LB pool, and
> surfing to _keepalive.xml when it's not in the LB pool.

Let me ask a question about this "keep alive" thing. From the above
summary (which you didn't write but did agree with), your LB runs an
XSLT (presumably, it hits a URL on each server that runs an XSLT) that
contacts a DB. What does it do when it contacts that DB? Does it run a
simple query like "SELECT 1 FROM DUAL" just to see if the database is
reachable? That kind of makes sense, other than the fact that it's using
XSLT to do that. Whatever... you know your environment better than I do.

If the above is true, are you saying that the XSLT reports that the node
is bad or that the node is good? Or that it responds randomly after
migrating a machine between ESX instances?

Does that XSLT use the /exact same connection pool/ that the webapp
does? Not just the same configuration but the /exact same object in
memory/? If not, then your XSLT is lying to you, or at least you are
drawing the wrong conclusions from it's reports.

Other than removing a broken server from the LB pool, what happens when
a server fails a "keep-alive" check (I would call this a "health check",
but you can call it whatever you'd like)? Does it restart the
server/container/anything? Or, does it page you and log an error? That
is, what is the standard operating procedure for a server falling-out of
the LB pool?

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3ViJAACgkQ9CaO5/Lv0PDQRQCfTkKb/RXfPOU/eVAA4ROlBF6D
nYoAn1GRNYH8Io6kIn12xd/fUE+QvuQG
=zsa+
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Application crash after Migrate to different ESX

Reply via email to