Hi Mathias, I think we understand your use case, it is not very uncommon. In fact, as I mentioned few times, it is the 'main' use case for Apache ( multi-process ) when using the JNI worker. In this case Apache acts as a 'natural' load-balancer, with requests going to various processes ( more or less randomly ). As in your case, requests without a session should allways go to the worker that is in the same process.
The main reason for using '0' for the "local" worker is that in jk2 I want to switch from float to int - there is no reason ( AFAIK ) to do all the float computation, even a short int will be enough for the purpose of implementing a round-roubin with weitghs. BTW, one extension I'm trying to make is support for multiple local workers - I'm still thining on how to do that. This will cover the case of few big boxes, each with several tomcat instances ( if you have many G of RAM and many processors, sometimes is better to run more VMs instead of a single large process ) In this case you still want some remote tomcats, for failover, but most load should go to the local workers. For jk2 I already fixed the selection of the 'recovering' worker, after timeout the worker will go through normal selection instead of beeing automatically chosen. For jk1 - I'm waiting for patches :-) I wouldn't do a big change - the current fix seemed like a good one. I agree that changing the meaning of 0 may be confusing ( is it documented ? my workers.properties says it should never be used ). We can fix that by using an additional flag - and not using special values. Another special note - Jk2 will also support 'gracefull shutdown', that means your case ( replacing a webapp ) will be handled in a different way. You should be able to add/remove workers without restarting apache ( and I hope mostly automated ). Let me know what you think - with patches if possible :-) Costin > The setup I use is the following, a load balancer (Alteon) is in front > of several Apache servers, each hosted on a machine which also hosts a > Tomcat. > Let's call those Apache servers A1, A2 and A3 and the associated Tomcat > servers T1, T2 and T3. > > I have been using Paul's patch which I modified so the lb_value field of > fault tolerant workers would not be changed to a value other than INF. > > The basic setup is that Ai can talk to all Tj, but for requests not > associated with a session, Ti will be used unless it is unavailable. > Sessions belonging to Tk will be correctly routed. The load balancing > worker definition is different for all three Ai, the lbfactor is set to > 0 for workers connecting to Tk for all k != i and set to 1.0 for the > worker connecting to Ti. > > This setup allows to have sticky sessions independently of the Apache > handling the request, which is a good thing since the Alteon cannot > extract the ';jsessionid=.....' part from the URL in a way which allows > the dispatching of the requests to the proper Ai (the cookie is dealed > with correctly though). > > This works perfectly except when we roll out a new release of our > webapps. In this case it would be ideal to be able to make the load > balancer ignore one Apache server, deploy the new version of the webapp > on this server, and switch this server back on and the other two off so > the service interruption would be as short as possible for the > customers. The immediate idea, if Ai/Ti is to be the first server to > have the new webapp, is to stop Ti so Ai will not be selected by the > load balancer. This does not work, indeed with Paul's patch Ti is the > preferred server BUT if Ti fails then another Tk will be selected by Ai, > therefore the load balancer will never declare Ai failed (even though we > managed to make it behave like this by specifying a test URL which > includes a jvmroute to Ti, but this uses lots of slb groups on the > alteon) and it will continue to send requests to it. > > Bernd's patch allows Ai to reject requests if Ti is stopped, the load > balancer will therefore quickly declare Ai inactive and will stop send > it requests, thus allowing to roll out the new webapp very easily, just > set up the new webapp, restart Ti, restart Ai, and as soon as the load > balancer sees Ai, shut down the other two Ak, the current sessions will > still be routed to the old webapp, and the new sessions will see the new > version. When there are no more sessions on the old version, shut down > Tk (k != i) and deploy the new webapp. > > My remark concerning the possible selection of recovering workers prior > to the local worker (one with lb_value set to 0) deals with the load > balancer not being able in this case to declare Ai inactive. > > I hope I have been clear enough, and that everybody got the point, if > not I'd be glad to explain more thoroughly. > > Mathias. > > Paul Frieden wrote: > > > > Hello, > > > > I'm afraid that I am no longer subscribed to the devel list. I would be > > happy to add my advice for this issue, but I don't have time to keep up > > with the entire devel list. If there is anything I can do, please just > > mail me directly. > > > > I chose to use the value 0 for a worker because it used the inverse of > > the value specified. The value 0 then resulted in essentially infinite > > preference. I used that approach purely because it was the smallest > > change possible, and the least likely to change the expected behavior > > for anybody else. The path of least astonishment and whatnot. I would > > be concerned about changing the current behavior now, because people > > probably want a drop in replacement. If there is going to be a change > > in the algorithm and behavior, a different approach may be better. > > > > I would also like to make a note of how we were using this code. In our > > environment, we have an external dedicated load balancer, and three web > > servers. The main problem that we ran into was with AOL users. AOL > > uses a proxy that randomizes the source IP of requests. That means that > > you can no longer count on the source IP to tell the load balancer which > > server to send future requests to. We used this code to allow sessions > > that arive on the wrong web server to be redirected to the tomcat on the > > correct server. This neatly side-steps the whole issue of changing IPs, > > because apache is able to make the decision based on the session ID. > > > > The reliability issue was a nice side effect for us in that it caught a > > failed server more quickly than the load balancer did, and prevented the > > user from having a connection time out or seeing an error message. > > > > I hope this provides some insight into why I changed the code that I > > did, and why that behavior worked well for us. > > > > Paul > > > > [EMAIL PROTECTED] wrote: > > > > >Hi Mathias, > > > > > >I think it would be better to discuss this on tomcat-dev. > > > > > >The 'error' worker will not be choosen unless the > > >timeout expires. When the timeout expires, we'll indeed > > >select it ( in preference to the default ) - this is easy to fix > > >if it creates problems, but I don't see why it would be a > > >problem. > > > > > >If it is working, next request will be served normally by > > >the default. If not, it'll go back to error state. > > > > > >In jk2 I removed that - error workers are no longer > > >selected. But for jk1 I would rather leave the old > > >behavior intact. > > > > > >Note that the reason for choosing 0 ( in jk2 ) as > > >default is that I want to switch from float to ints, > > >I'm not convinced floats are good for performance > > >( or needed ). > > > > > >Again - I'm just learning and trying, if you have > > >any idea I would be happy to hear them, patches > > >are more than wellcome. > > > > > >Costin > > > > > >On Sat, 4 May 2002, Mathias Herberts wrote: > > > > > > > > > > > >>Hi, I just joined the Tomcat-dev list and saw your patch to > > >>jk_lb_worker.c (making it version 1.9). > > >> > > >>If I understand well your patch it offers the same behaviors as Paul's > > >>patch but with an opposite semantic for a lbfactor of 0.0 in the > > >>worker's definition, i.e. a value of 0.0 now means ALWAYS USE THIS > > >>WORKER FOR REQUESTS WITH NO SESSIONS instead of NEVER USE THIS WORKER > > >>FOR REQUESTS WITH NO SESSIONS. This seems fine to me. > > >> > > >>What disturbs me is what is happening when one worker is in error > > >>state and not yet recovering. In get_most_suitable worker, such a > > >>worker will be selected whatever its lb_value, meaning a recovering > > >>worker will have priority over one with a lb_value of 0.0 and this > > >>seems to break the behavior we had achieved with your patch. > > >> > > >>Did I miss something or is this really a problem? > > >> > > >>Mathias. > > >> > > >> > > >> > > > > > > > > > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>