Hi everyone,
I was using the load balancer worker with Netscape and I noticed that the load balancer worker takes a long time to recover whenever all of the tomcats are shut down. I am using version 3.2.2. However I checked out the latest 3.3 code and the load balancer is exactly the same as 3.2.2. The patch I am submitting is against the latest 3.3 code. I did not have a chance to look at the 4.x code. Reason for patch: If a tomcat goes down, it gets taken out of the "worker list" for 60 seconds, before its retried. If all the tomcats that are being load balanced are being restarted (one at a time to upgrade the system for instance), you need to wait until the tomcat that you just brought up to get back in the list before you shut down the next one, or you may have website downtime. Same problem if any intermittent network problem would occur between Netscape and tomcat. The downtime would be at least 60 seconds. To avoid that I made a fix to the load balancer worker that would be activated only when all the tomcat workers were taken out of the list due to failure. If all the load balanced tomcat workers are out of the list, the patched load balancer would go through all of the workers again once in reverse order of their "last error time" until one is found. They would be retried even though 60 seconds did not pass yet. If and only if all the workers have been tried once in this particular request and all failed an error would be returned to the user. The retries would continue for every request until at least one tomcat worker recovers. After the first tomcat worker recovers, the rest would be retried once every 60 seconds as before. I found this very useful in our environment. Patch attached Thanks, Eugene Gluzberg
lb.patch
Description: Binary data
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>