On 01/12/2011 03:36 PM, Rainer Jung wrote: > > That was meant as an improvement. "Recovery" is only used, when a > worker has been in error state for enough time (by default 60 seconds) > and we want to find out whether it is still in error or not. mod_jk > has no active probing with test URLs, so if the 60 seconds passed it > marks the worker/instance a recoverable. Then it waits for the next > request which is eligible to be handled by that instance (either > carrying an old session iformation pointing to that instance or being > freely balancable), and sends it there, marking the worker as being > probed. If it succeeds, fine we can set the instance to OK. If not the > instance goes back to error and the request will be sent to some other > OK instance. > > Most of the time an instance will be in error for longer time, so > probing in parallel is not a good idea, e.g. if the error state leads > to delays in request handling. OTOH if the worker is fine now, it > should ususally respond very quickly, so the time window where request > could have been handled by the worker but weren't is very small. This > seems fine to me. > > All this will be nicer once we add a probing feature to the > maintenance thread which will probe the erroneous workers via a > background thread and recover the worker as soon as the probing succeeds. >
Aahh, having the maintenance thread do a periodic probe would be awesome. I see what you mean about parallel probing delaying request handling, but what would you think about modifying the loop so that after going through all the JK_WORKER_USABLE() workers to retry the PROBING workers. At that point if none of the workers are up, then it seems like parallel probing wouldn't be a bad idea would it? Thanks, Andy --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org