Hi, I included the thread that started via email so the list can follow on.
The setup I use is the following, a load balancer (Alteon) is in front of several Apache servers, each hosted on a machine which also hosts a Tomcat. Let's call those Apache servers A1, A2 and A3 and the associated Tomcat servers T1, T2 and T3. I have been using Paul's patch which I modified so the lb_value field of fault tolerant workers would not be changed to a value other than INF. The basic setup is that Ai can talk to all Tj, but for requests not associated with a session, Ti will be used unless it is unavailable. Sessions belonging to Tk will be correctly routed. The load balancing worker definition is different for all three Ai, the lbfactor is set to 0 for workers connecting to Tk for all k != i and set to 1.0 for the worker connecting to Ti. This setup allows to have sticky sessions independently of the Apache handling the request, which is a good thing since the Alteon cannot extract the ';jsessionid=.....' part from the URL in a way which allows the dispatching of the requests to the proper Ai (the cookie is dealed with correctly though). This works perfectly except when we roll out a new release of our webapps. In this case it would be ideal to be able to make the load balancer ignore one Apache server, deploy the new version of the webapp on this server, and switch this server back on and the other two off so the service interruption would be as short as possible for the customers. The immediate idea, if Ai/Ti is to be the first server to have the new webapp, is to stop Ti so Ai will not be selected by the load balancer. This does not work, indeed with Paul's patch Ti is the preferred server BUT if Ti fails then another Tk will be selected by Ai, therefore the load balancer will never declare Ai failed (even though we managed to make it behave like this by specifying a test URL which includes a jvmroute to Ti, but this uses lots of slb groups on the alteon) and it will continue to send requests to it. Bernd's patch allows Ai to reject requests if Ti is stopped, the load balancer will therefore quickly declare Ai inactive and will stop send it requests, thus allowing to roll out the new webapp very easily, just set up the new webapp, restart Ti, restart Ai, and as soon as the load balancer sees Ai, shut down the other two Ak, the current sessions will still be routed to the old webapp, and the new sessions will see the new version. When there are no more sessions on the old version, shut down Tk (k != i) and deploy the new webapp. My remark concerning the possible selection of recovering workers prior to the local worker (one with lb_value set to 0) deals with the load balancer not being able in this case to declare Ai inactive. I hope I have been clear enough, and that everybody got the point, if not I'd be glad to explain more thoroughly. Mathias. Paul Frieden wrote: > > Hello, > > I'm afraid that I am no longer subscribed to the devel list. I would be > happy to add my advice for this issue, but I don't have time to keep up > with the entire devel list. If there is anything I can do, please just > mail me directly. > > I chose to use the value 0 for a worker because it used the inverse of > the value specified. The value 0 then resulted in essentially infinite > preference. I used that approach purely because it was the smallest > change possible, and the least likely to change the expected behavior > for anybody else. The path of least astonishment and whatnot. I would > be concerned about changing the current behavior now, because people > probably want a drop in replacement. If there is going to be a change > in the algorithm and behavior, a different approach may be better. > > I would also like to make a note of how we were using this code. In our > environment, we have an external dedicated load balancer, and three web > servers. The main problem that we ran into was with AOL users. AOL > uses a proxy that randomizes the source IP of requests. That means that > you can no longer count on the source IP to tell the load balancer which > server to send future requests to. We used this code to allow sessions > that arive on the wrong web server to be redirected to the tomcat on the > correct server. This neatly side-steps the whole issue of changing IPs, > because apache is able to make the decision based on the session ID. > > The reliability issue was a nice side effect for us in that it caught a > failed server more quickly than the load balancer did, and prevented the > user from having a connection time out or seeing an error message. > > I hope this provides some insight into why I changed the code that I > did, and why that behavior worked well for us. > > Paul > > [EMAIL PROTECTED] wrote: > > >Hi Mathias, > > > >I think it would be better to discuss this on tomcat-dev. > > > >The 'error' worker will not be choosen unless the > >timeout expires. When the timeout expires, we'll indeed > >select it ( in preference to the default ) - this is easy to fix > >if it creates problems, but I don't see why it would be a > >problem. > > > >If it is working, next request will be served normally by > >the default. If not, it'll go back to error state. > > > >In jk2 I removed that - error workers are no longer > >selected. But for jk1 I would rather leave the old > >behavior intact. > > > >Note that the reason for choosing 0 ( in jk2 ) as > >default is that I want to switch from float to ints, > >I'm not convinced floats are good for performance > >( or needed ). > > > >Again - I'm just learning and trying, if you have > >any idea I would be happy to hear them, patches > >are more than wellcome. > > > >Costin > > > >On Sat, 4 May 2002, Mathias Herberts wrote: > > > > > > > >>Hi, I just joined the Tomcat-dev list and saw your patch to > >>jk_lb_worker.c (making it version 1.9). > >> > >>If I understand well your patch it offers the same behaviors as Paul's > >>patch but with an opposite semantic for a lbfactor of 0.0 in the > >>worker's definition, i.e. a value of 0.0 now means ALWAYS USE THIS > >>WORKER FOR REQUESTS WITH NO SESSIONS instead of NEVER USE THIS WORKER > >>FOR REQUESTS WITH NO SESSIONS. This seems fine to me. > >> > >>What disturbs me is what is happening when one worker is in error > >>state and not yet recovering. In get_most_suitable worker, such a > >>worker will be selected whatever its lb_value, meaning a recovering > >>worker will have priority over one with a lb_value of 0.0 and this > >>seems to break the behavior we had achieved with your patch. > >> > >>Did I miss something or is this really a problem? > >> > >>Mathias. > >> > >> > >> > > > > > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>