Attached is a patch for the lb worker for mod_jk. Basically, it changes the selection behavior slightly and adds some error and debugging logging where we would find it useful. The code uses two variables, lb_factor, and lb_value. lb_factor is the numerical inverse (1/x) of what is entered into the workers.properties file. This causes the lb_factor to become very large if the lb_factor in workers.properties is 0. lb_value always starts at 0. The decision of which worker is used is based on lb_value. The worker with the lowest lb_value gets selected to service a request. After a worker is used, its lb_value is incremented by lb_factor. Unfortunately, this causes the balancer to service at least one request on each worker before the lb_factor actually has any effect. This one request will often lead to an entire session being served off of a different worker. This behavior isn't really a problem, but in a scenario where you have an external load balancer, it is preferable to try to honor its decisions except where there is an error. Such an error can happen with providers that use IP randomizing proxies such as AOL. Its also nice to be more deterministic in the normal case. This patch seeds the lb_value with 1/lb_factor. This changes the behavior by causing lb_value to be larger for servers with lower weights than servers with higher weights. If the lb_factor in workers.properties is 0, it becomes very large and should only be selected in the case of all the regular workers being unavailable or due to a session route. I added error logging for if the worker specified by the session route is unavailable. I added debug logging for selecting a worker by session route, and for which worker is selected. This hasn't been tested much, but its almost a trivial change. This is against 3.2.1, but it should apply clean to later versions as well. Feedback is welcome. Paul
--- /tmp/jakarta-tomcat-3.2.1-src/src/native/jk/jk_lb_worker.c Tue Dec 12 16:51:56 2000 +++ ../jk/jk_lb_worker.c Mon May 7 16:23:20 2001 @@ -244,7 +244,8 @@ } static worker_record_t *get_most_suitable_worker(lb_worker_t *p, - jk_ws_service_t *s) + jk_ws_service_t *s, + jk_logger_t *l) { worker_record_t *rc = NULL; double lb_min = 0.0; @@ -255,8 +256,14 @@ for(i = 0 ; i < p->num_of_workers ; i++) { if(0 == strcmp(session_route, p->lb_workers[i].name)) { if(p->lb_workers[i].in_error_state) { - break; + jk_log(l, JK_LOG_ERROR, + "In get_most_suitable_worker, requested worker (%s) +unavailable, redirecting session\n", + p->lb_workers[i].name); + break; } else { + jk_log(l, JK_LOG_DEBUG, + "In get_most_suitable_worker, selected %s because of +session_route\n", + p->lb_workers[i].name); return &(p->lb_workers[i]); } } @@ -282,13 +289,13 @@ lb_min = p->lb_workers[i].lb_value; rc = &(p->lb_workers[i]); } - } + } } if(rc) { rc->lb_value += rc->lb_factor; + jk_log(l, JK_LOG_DEBUG, "In get_most_suitable_worker, selected %s\n", +rc->name); } - return rc; } @@ -309,7 +316,7 @@ while(1) { - worker_record_t *rec = get_most_suitable_worker(p->worker, s); + worker_record_t *rec = get_most_suitable_worker(p->worker, s, l); int rc; if(rec) { @@ -347,7 +354,7 @@ * Error is not recoverable - break with an error. */ jk_log(l, JK_LOG_ERROR, - "In jk_endpoint_t::service, none recoverable error...\n"); + "In jk_endpoint_t::service, non recoverable error...\n"); break; } @@ -426,7 +433,7 @@ p->lb_workers[i].lb_factor = jk_get_lb_factor(props, worker_names[i]); p->lb_workers[i].lb_factor = 1/p->lb_workers[i].lb_factor; - p->lb_workers[i].lb_value = 0.0; + p->lb_workers[i].lb_value = p->lb_workers[i].lb_factor; p->lb_workers[i].in_error_state = JK_FALSE; p->lb_workers[i].in_recovering = JK_FALSE; if(!wc_create_worker(p->lb_workers[i].name,