Hi everyone,

I was using the load balancer worker with Netscape and I noticed that the
load balancer worker takes a long time to recover whenever all of the
tomcats are shut down. I am using version 3.2.2. However I checked out the
latest 3.3 code and the load balancer is exactly the same as 3.2.2. The
patch I am submitting is against the latest 3.3 code. I did not have a
chance to look at the 4.x code.

Reason for patch:
If a tomcat goes down, it gets taken out of the "worker list" for 60
seconds, before its retried. If all the tomcats that are being load balanced
are being restarted (one at a time to upgrade the system for instance), you
need to wait until the tomcat that you just brought up to get back in the
list before you shut down the next one, or you may have website downtime.
Same problem if any intermittent network problem would occur between
Netscape and tomcat. The downtime would be at least 60 seconds.

To avoid that I made a fix to the load balancer worker that would be
activated only when all the tomcat workers were taken out of the list due to
failure.

If all the load balanced tomcat workers are out of the list, the patched
load balancer would go through all of the workers again once in reverse
order of their "last error time" until one is found. They would be retried
even though 60 seconds did not pass yet. If and only if all the workers have
been tried once in this particular request and all failed an error would be
returned to the user. The retries would continue for every request until at
least one tomcat worker recovers. After the first tomcat worker recovers,
the rest would be retried once every 60 seconds as before.

I found this very useful in our environment.

Patch attached

Thanks,
Eugene Gluzberg

Attachment: lb.patch
Description: Binary data

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to