Glenn
Hans Schmid wrote:
Sorry Glenn,
by looking deeper into the mod_jk.log. When changing worker names, I realized, that I was actually restarting Apache with the same worker.properties every time.
There was a link earlier in the configuration chain, which made my switching useless :(
We should definately reduce our linking !!!
Thanks very much.
p.s. If anybody else is interested in our LB/failover setup I am glad to provide some info.
Best regards, Hans
-----Ursprungliche Nachricht----- Von: Hans Schmid [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 16. Juli 2003 15:15 An: Tomcat Developers List Betreff: AW: jk 1.2.4 LB bug?
Thanks for your reply, comments see inline
-----Ursprungliche Nachricht----- Von: Glenn Nielsen [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 16. Juli 2003 12:26 An: Tomcat Developers List Betreff: Re: jk 1.2.4 LB bug?
mod_jk will print out information about the lb config if you set the JkLogLevel to debug.
done
I would suggest setting up a test system where you can test the below with JkLogLevel debug configured. Then grep the log for lines which have jk_lb_worker.c in them.
OK
This will tell you several things.
1. Whether the worker.properties are getting reread when you do an apache restart. (They should be)
Yes they were reread: Initial: [Wed Jul 16 14:11:14 2003] [jk_worker.c (118)]: Into wc_close [Wed Jul 16 14:11:14 2003] [jk_worker.c (199)]: close_workers got 6 workers to destroy [Wed Jul 16 14:11:14 2003] [jk_worker.c (206)]: close_workers will destroy worker lb-einsurance [Wed Jul 16 14:11:14 2003] [jk_lb_worker.c (561)]: Into jk_worker_t::destroy [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1461)]: Into jk_worker_t::destroy [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1468)]: Into jk_worker_t::destroy up to 1 endpoint to close [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (605)]: In jk_endpoint_t::ajp_close_endpoint [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (612)]: In jk_endpoint_t::ajp_close_endpoint, closed sd = 12 [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1461)]: Into jk_worker_t::destroy [Wed Jul 16 14:11:14 2003] [jk_worker.c (118)]: Into wc_close [Wed Jul 16 14:11:14 2003] [jk_worker.c (118)]: Into wc_close [Wed Jul 16 14:11:14 2003] [jk_worker.c (118)]: Into wc_close [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1468)]: Into jk_worker_t::destroy up to 1 endpoint to close [Wed Jul 16 14:11:14 2003] [jk_worker.c (199)]: close_workers got 6 workers to destroy [Wed Jul 16 14:11:14 2003] [jk_worker.c (199)]: close_workers got 6 workers to destroy [Wed Jul 16 14:11:14 2003] [jk_worker.c (199)]: close_workers got 6 workers to destroy [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1461)]: Into jk_worker_t::destroy [Wed Jul 16 14:11:14 2003] [jk_worker.c (206)]: close_workers will destroy worker lb-einsurance [Wed Jul 16 14:11:14 2003] [jk_worker.c (206)]: close_workers will destroy worker lb-einsurance [Wed Jul 16 14:11:14 2003] [jk_worker.c (206)]: close_workers will destroy worker lb-einsurance [Wed Jul 16 14:11:14 2003] [jk_ajp_common.c (1468)]: Into jk_worker_t::destroy up to 1 endpoint to close [Wed Jul 16 14:11:14 2003] [jk_lb_worker.c (561)]: Into jk_worker_t::destroy [Wed Jul 16 14:11:14 2003] [jk_lb_worker.c (561)]: Into jk_worker_t::destroy [Wed Jul 16 14:11:14 2003] [jk_lb_worker.c (561)]: Into jk_worker_t::destroy
... closing other not related worker
[Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (172)]: Into jk_uri_worker_map_t::uri_worker_map_alloc [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (375)]: Into jk_uri_worker_map_t::uri_worker_map_open [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (396)]: jk_uri_worker_map_t::uri_worker_map_open, rule map size is 12 [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (321)]: Into jk_uri_worker_map_t::uri_worker_map_open, match rule /einsurance/=lb-einsurance was added [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (345)]: Into jk_uri_worker_map_t::uri_worker_map_open, exact rule /einsurance=lb-einsurance was added
... 5 other workers (including other lb-workers and normal workers)
added [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (408)]: Into jk_uri_worker_map_t::uri_worker_map_open, there are 12 rules [Wed Jul 16 14:11:16 2003] [jk_uri_worker_map.c (422)]: jk_uri_worker_map_t::uri_worker_map_open, done [Wed Jul 16 14:11:16 2003] [jk_worker.c (88)]: Into wc_open [Wed Jul 16 14:11:16 2003] [jk_worker.c (222)]: Into build_worker_map, creating 6 workers [Wed Jul 16 14:11:16 2003] [jk_worker.c (228)]: build_worker_map, creating worker lb-einsurance [Wed Jul 16 14:11:16 2003] [jk_worker.c (148)]: Into wc_create_worker [Wed Jul 16 14:11:16 2003] [jk_worker.c (162)]: wc_create_worker, about to create instance lb-einsurance of lb [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (586)]: Into lb_worker_factory [Wed Jul 16 14:11:16 2003] [jk_worker.c (171)]: wc_create_worker, about to validate and init lb-einsurance [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (420)]: Into jk_worker_t::validate [Wed Jul 16 14:11:16 2003] [jk_worker.c (148)]: Into wc_create_worker [Wed Jul 16 14:11:16 2003] [jk_worker.c (162)]: wc_create_worker, about to create instance ajp13-01 of ajp13 [Wed Jul 16 14:11:16 2003] [jk_ajp13_worker.c (108)]: Into ajp13_worker_factory [Wed Jul 16 14:11:16 2003] [jk_worker.c (171)]: wc_create_worker, about to validate and init ajp13-01 [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1343)]: Into jk_worker_t::validate [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1364)]: In jk_worker_t::validate for worker ajp13-01 contact is tomcathost-ei:11009 [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1397)]: Into jk_worker_t::init [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1421)]: In jk_worker_t::init, setting socket timeout to 0 [Wed Jul 16 14:11:16 2003] [jk_worker.c (187)]: wc_create_worker, done [Wed Jul 16 14:11:16 2003] [jk_worker.c (148)]: Into wc_create_worker [Wed Jul 16 14:11:16 2003] [jk_worker.c (162)]: wc_create_worker, about to create instance ajp13-02 of ajp13 [Wed Jul 16 14:11:16 2003] [jk_ajp13_worker.c (108)]: Into ajp13_worker_factory [Wed Jul 16 14:11:16 2003] [jk_worker.c (171)]: wc_create_worker, about to validate and init ajp13-02 [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1343)]: Into jk_worker_t::validate [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1364)]: In jk_worker_t::validate for worker ajp13-02 contact is tomcathost-ei:11019 [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1397)]: Into jk_worker_t::init [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1421)]: In jk_worker_t::init, setting socket timeout to 0 [Wed Jul 16 14:11:16 2003] [jk_worker.c (187)]: wc_create_worker, done [Wed Jul 16 14:11:16 2003] [jk_worker.c (148)]: Into wc_create_worker [Wed Jul 16 14:11:16 2003] [jk_worker.c (162)]: wc_create_worker, about to create instance ajp13-sb of ajp13 [Wed Jul 16 14:11:16 2003] [jk_ajp13_worker.c (108)]: Into ajp13_worker_factory [Wed Jul 16 14:11:16 2003] [jk_worker.c (171)]: wc_create_worker, about to validate and init ajp13-sb [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1343)]: Into jk_worker_t::validate [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1364)]: In jk_worker_t::validate for worker ajp13-sb contact is tomcathost-ei-sb:11015 [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1397)]: Into jk_worker_t::init [Wed Jul 16 14:11:16 2003] [jk_ajp_common.c (1421)]: In jk_worker_t::init, setting socket timeout to 0 [Wed Jul 16 14:11:16 2003] [jk_worker.c (187)]: wc_create_worker, done [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 0 has name ajp13-01 [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 1 has name ajp13-sb [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 2 has name ajp13-02 [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (502)]: in_local_worker_mode: true [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (505)]: local_worker_only: false [Wed Jul 16 14:11:16 2003] [jk_worker.c (187)]: wc_create_worker, done [Wed Jul 16 14:11:16 2003] [jk_worker.c (238)]: build_worker_map, removing old lb-einsurance worker
this last line looks suspicous to me
2. What the lb worker thinks the config is.
initial: [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (586)]: Into lb_worker_factory [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (420)]: Into jk_worker_t::validate [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (498)]: Balanced worker 0 has name ajp13-01 [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (498)]: Balanced worker 1 has name ajp13-sb [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (498)]: Balanced worker 2 has name ajp13-02 [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (502)]: in_local_worker_mode: true [Wed Jul 16 14:04:44 2003] [jk_lb_worker.c (505)]: local_worker_only: false
but after the switching and graceful restart exactly the same (which is the error) !!!!!!!!
[Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (420)]: Into jk_worker_t::validate [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 0 has name ajp13-01 [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 1 has name ajp13-sb [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (498)]: Balanced worker 2 has name ajp13-02 [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (502)]: in_local_worker_mode: true [Wed Jul 16 14:11:16 2003] [jk_lb_worker.c (505)]: local_worker_only: false
This explains the observed (wrong) fall-over behavior, (should be ajp13-02, ajp13-sb, ajp13-01)
original workers.properties: worker.ajp13-01.lbfactor=1 worker.ajp13-01.local_worker=1
worker.ajp13-02.lbfactor=1 worker.ajp13-02.local_worker=0
worker.ajp13-sb.lbfactor=0 worker.ajp13-sb.local_worker=1
local_worker_only=0 for the lb-worker
changed to before graceful restart: (linking a different worker.properties)
worker.ajp13-01.lbfactor=1 worker.ajp13-01.local_worker=0
worker.ajp13-02.lbfactor=1 worker.ajp13-02.local_worker=1
worker.ajp13-sb.lbfactor=0 worker.ajp13-sb.local_worker=1
local_worker_only=0 the lb-worker
So it *seems* there might be something wrong with the reinitialisation of the worker order ?
If you need further information, I can mail you the complete logs offline.
Thanks for looking into this, Hans
Then post those log lines here.version and
Thanks,
Glenn
Hans Schmid wrote:
Hi,
I noticed the following with mod_jk 1.2.4, Apache 1.3.26 and Tomcat 3.3.1a on Solaris 8 JDK 1.4.1_03.
Maybe a LB bug (Loadfactors do not recover after startup of new tomcat/graceful Apache restart).
Let me explain my scenario first:
I want to gracefully upgrade our webapp without loosing
sessions + have a
fail over scenario. Therefor we have sticky sessions enabled.
Setup: 1 tomcat 01 running on Server A, 0 tomcat 02 running on Server A, 1 tomcat SB running on Server B
01 tomcat on Server A runs the application, SB tomcat on server B is standby(fallback), 02 tomcat is shutdown on Server A at the moment.
All three Tomcats are in the same lb_worker:
worker.list=lb-worker
worker.ajp13-01.port=11009 worker.ajp13-01.host=A worker.ajp13-01.type=ajp13 worker.ajp13-01.lbfactor=1 worker.ajp13-01.local_worker=1
worker.ajp13-02.port=11019 worker.ajp13-02.host=A worker.ajp13-02.type=ajp13 worker.ajp13-02.lbfactor=1 worker.ajp13-02.local_worker=0
worker.ajp13-sb.port=11015 worker.ajp13-sb.host=B worker.ajp13-sb.type=ajp13 worker.ajp13-sb.lbfactor=0 worker.ajp13-sb.local_worker=1
worker.lb-worker.type=lb worker.lb-worker.balanced_workers=ajp13-01, ajp13-02, ajp13-sb worker.lb-worker.local_worker_only=0
The worker List order should now be: 1. worker.ajp13-01 lbfactor=1,local_worker=1 TC 01 2. worker.ajp13-sb lbfactor=0,local_worker=1 TC SB 3. worker.ajp13-02 lbfactor=1,local_worker=0) TC 02 (not running)
Now all requests go to worker.ajp13-01 (TC 01), none to
worker.ajp13-sb (TC
SB lbfactor=0), none to worker.ajp13-02.port (TC 02 not running).
If Server A crashes (TC 01) all new requests go to Server B (TC SB worker.ajp13-sb) since this is then the only running Tomcat FINE This is our Fail-Over Solution (lost running sessions, but OK).
Now the webapp update Scenario:
1.) shutdown TC SB on Server B, update the webapp, start tc SB
and test via
a seperate HTTPConnector port without Apache. 2.) this does not affect anything on production, since the
lbfactor=0 on TC
SB -> no sessions arrive on tc SB 3.) When the test was successful, our Standby Tomcat SB is updated 4.) Now upgrade the webapp on Server A TC 02, which is currently not running. 5.) Start up TC 02 on Server A with the new version of the webapp, immediately exchange the worker.properties with a different
01 to thegracefully restart apache:
worker.list=lb-worker
worker.ajp13-01.port=11009 worker.ajp13-01.host=A worker.ajp13-01.type=ajp13 worker.ajp13-01.lbfactor=1 worker.ajp13-01.local_worker=0 <---- put old webapp on TC
it is notforeign worker list
worker.ajp13-02.port=11019 worker.ajp13-02.host=A worker.ajp13-02.type=ajp13 worker.ajp13-02.lbfactor=1 worker.ajp13-02.local_worker=1 <---- put new webapp on TC
02 in front of
the local worker list
worker.ajp13-sb.port=11015 worker.ajp13-sb.host=B worker.ajp13-sb.type=ajp13 worker.ajp13-sb.lbfactor=0 worker.ajp13-sb.local_worker=1
worker.lb-worker.type=lb worker.lb-worker.balanced_workers=ajp13-01, ajp13-02, ajp13-sb worker.lb-worker.local_worker_only=0
Just the two lines marked above with <---- swap (local_worker values of TC 01 and TC 02)
6.) now all 3 Tomcats are running. All existing sessions still
go to TC 01
(sticky sessions; we do not loose running sessions) 7.) What I expect: TC 02 takes a while to startup. The worker List order should now be: 1. worker.ajp13-02 lbfactor=1,local_worker=1 TC 02 2. worker.ajp13-sb lbfactor=0,local_worker=1 TC SB 3. worker.ajp13-01 lbfactor=1,local_worker=0) TC 01 (old webapp)
Since TC 02 needs 3 minutes to start up (filling caches etc.)
the workerimmediately availlable. During this time new sessions arrive at TC SB, since it is the
next in the
worker list. OK fine this works. Since these sessions are sticky as well, all users connecting
during this
time stay on TC SB during their whole session life. FINE
8.) As soon as TC 02 is up and running (finished all
load-on-startup servlet
initialisition stuff) I would expect that TC 02 gets all new Sessions (Number 1 in
even after aList).
This is not the case! All new Sessions still arrive at TC SB.
9.) After a while (one hour) we shutdown TC 01. Since no new sessions arrived there since our graceful restart of Apache, all old Sessions should have expired.
10.) even now (only 2 Tomcats running TC 02 and TC SB) and
mod_jk.loggraceful restart new Sessions arrive at TC SB
Conclusion: Now, do I misunderstand the supposed behaviour of lbfactor and
local_worker
flag ? I think that the behaviour in 8.) is wrong. 10.) is starange too.
Thanks for any suggestion if I am completely wrong here or further looking into this.
Hans
-----Ursprungliche Nachricht----- Von: Glenn Nielsen [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 9. Juli 2003 15:56 An: Tomcat Developers List Betreff: Re: jk 1.2.25 release ?
I was hoping to get it released this week.
But I just noticed that under Apache 2 mod_jk piped logs there are two instances of the piped log program running for the same log file. I want to track this down.
I also just implemented load balancing this morning on a production server. I noticed that when none of the workers for the load balancer were available an HTTP status code of 200 was being logged in
tomcat serverswhen request logging was enabled. So I want to look into this also.
Hopefully now that I have load balancing in place with 2
jackpot. :-)instead of 1 the Missouri Lottery web site I administer will scale to handle the big spike in load tonight for the $240 PowerBall
Regards,
Glenn
Henri Gomez wrote:
Any date ?
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
-- ---------------------------------------------------------------------- Glenn Nielsen [EMAIL PROTECTED] | /* Spelin donut madder | MOREnet System Programming | * if iz ina coment. | Missouri Research and Education Network | */ | ----------------------------------------------------------------------
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]