Re: Solved AW: jk 1.2.4 LB bug?

Glenn Nielsen Wed, 16 Jul 2003 08:52:42 -0700

Glad I could help!

Glenn

Hans Schmid wrote:

Sorry Glenn,

by looking deeper into the mod_jk.log. When changing worker names, I
realized, that I was actually
restarting Apache with the same worker.properties every time.

There was a link earlier in the configuration chain, which made my switching
useless :(

We should definately reduce our linking !!!

Thanks very much.

p.s. If anybody else is interested in our LB/failover setup I am glad to
provide some info.

Best regards,
Hans

-----Ursprungliche Nachricht-----
Von: Hans Schmid [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 16. Juli 2003 15:15
An: Tomcat Developers List
Betreff: AW: jk 1.2.4 LB bug?


Thanks for your reply,
comments see inline

-----Ursprungliche Nachricht-----
Von: Glenn Nielsen [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 16. Juli 2003 12:26
An: Tomcat Developers List
Betreff: Re: jk 1.2.4 LB bug?


mod_jk will print out information about the lb config if you set
the JkLogLevel to debug.

done

I would suggest setting up a test system where you can test
the below with JkLogLevel debug configured.  Then grep the
log for lines which have jk_lb_worker.c in them.

OK

This will tell you several things.

1. Whether the worker.properties are getting reread when you
  do an apache restart.  (They should be)


Yes they were reread:
Initial:
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (118)]: Into wc_close
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (199)]: close_workers
got 6 workers
to destroy
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (206)]: close_workers
will destroy
worker lb-einsurance
[Wed Jul 16 14:11:14 2003]  [jk_lb_worker.c (561)]: Into
jk_worker_t::destroy
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1461)]: Into
jk_worker_t::destroy
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1468)]: Into
jk_worker_t::destroy up to 1 endpoint to close
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (605)]: In
jk_endpoint_t::ajp_close_endpoint
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (612)]: In
jk_endpoint_t::ajp_close_endpoint, closed sd = 12
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1461)]: Into
jk_worker_t::destroy
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (118)]: Into wc_close
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (118)]: Into wc_close
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (118)]: Into wc_close
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1468)]: Into
jk_worker_t::destroy up to 1 endpoint to close
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (199)]: close_workers
got 6 workers
to destroy
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (199)]: close_workers
got 6 workers
to destroy
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (199)]: close_workers
got 6 workers
to destroy
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1461)]: Into
jk_worker_t::destroy
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (206)]: close_workers
will destroy
worker lb-einsurance
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (206)]: close_workers
will destroy
worker lb-einsurance
[Wed Jul 16 14:11:14 2003]  [jk_worker.c (206)]: close_workers
will destroy
worker lb-einsurance
[Wed Jul 16 14:11:14 2003]  [jk_ajp_common.c (1468)]: Into
jk_worker_t::destroy up to 1 endpoint to close
[Wed Jul 16 14:11:14 2003]  [jk_lb_worker.c (561)]: Into
jk_worker_t::destroy
[Wed Jul 16 14:11:14 2003]  [jk_lb_worker.c (561)]: Into
jk_worker_t::destroy
[Wed Jul 16 14:11:14 2003]  [jk_lb_worker.c (561)]: Into
jk_worker_t::destroy

... closing other not related worker

[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (172)]: Into
jk_uri_worker_map_t::uri_worker_map_alloc
[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (375)]: Into
jk_uri_worker_map_t::uri_worker_map_open
[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (396)]:
jk_uri_worker_map_t::uri_worker_map_open, rule map size is 12
[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (321)]: Into
jk_uri_worker_map_t::uri_worker_map_open, match rule
/einsurance/=lb-einsurance was added
[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (345)]: Into
jk_uri_worker_map_t::uri_worker_map_open, exact rule
/einsurance=lb-einsurance was added

... 5 other workers (including other lb-workers and normal workers)

added
[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (408)]: Into
jk_uri_worker_map_t::uri_worker_map_open, there are 12 rules
[Wed Jul 16 14:11:16 2003]  [jk_uri_worker_map.c (422)]:
jk_uri_worker_map_t::uri_worker_map_open, done
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (88)]: Into wc_open
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (222)]: Into build_worker_map,
creating 6 workers
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (228)]:
build_worker_map, creating
worker lb-einsurance
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (148)]: Into wc_create_worker
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (162)]:
wc_create_worker, about to
create instance lb-einsurance of lb
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (586)]: Into lb_worker_factory
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (171)]:
wc_create_worker, about to
validate and init lb-einsurance
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (420)]: Into
jk_worker_t::validate
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (148)]: Into wc_create_worker
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (162)]:
wc_create_worker, about to
create instance ajp13-01 of ajp13
[Wed Jul 16 14:11:16 2003]  [jk_ajp13_worker.c (108)]: Into
ajp13_worker_factory
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (171)]:
wc_create_worker, about to
validate and init ajp13-01
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1343)]: Into
jk_worker_t::validate
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1364)]: In
jk_worker_t::validate for worker ajp13-01 contact is tomcathost-ei:11009
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1397)]: Into
jk_worker_t::init
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1421)]: In
jk_worker_t::init,
setting socket timeout to 0
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (187)]: wc_create_worker, done
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (148)]: Into wc_create_worker
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (162)]:
wc_create_worker, about to
create instance ajp13-02 of ajp13
[Wed Jul 16 14:11:16 2003]  [jk_ajp13_worker.c (108)]: Into
ajp13_worker_factory
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (171)]:
wc_create_worker, about to
validate and init ajp13-02
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1343)]: Into
jk_worker_t::validate
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1364)]: In
jk_worker_t::validate for worker ajp13-02 contact is tomcathost-ei:11019
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1397)]: Into
jk_worker_t::init
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1421)]: In
jk_worker_t::init,
setting socket timeout to 0
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (187)]: wc_create_worker, done
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (148)]: Into wc_create_worker
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (162)]:
wc_create_worker, about to
create instance ajp13-sb of ajp13
[Wed Jul 16 14:11:16 2003]  [jk_ajp13_worker.c (108)]: Into
ajp13_worker_factory
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (171)]:
wc_create_worker, about to
validate and init ajp13-sb
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1343)]: Into
jk_worker_t::validate
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1364)]: In
jk_worker_t::validate for worker ajp13-sb contact is
tomcathost-ei-sb:11015
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1397)]: Into
jk_worker_t::init
[Wed Jul 16 14:11:16 2003]  [jk_ajp_common.c (1421)]: In
jk_worker_t::init,
setting socket timeout to 0
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (187)]: wc_create_worker, done
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 0 has
name ajp13-01
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 1 has
name ajp13-sb
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 2 has
name ajp13-02
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (502)]: in_local_worker_mode:
true
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (505)]:
local_worker_only: false
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (187)]: wc_create_worker, done
[Wed Jul 16 14:11:16 2003]  [jk_worker.c (238)]:
build_worker_map, removing
old lb-einsurance worker

this last line looks suspicous to me

2. What the lb worker thinks the config is.


initial:
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (586)]: Into lb_worker_factory
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (420)]: Into
jk_worker_t::validate
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (498)]: Balanced worker 0 has
name ajp13-01
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (498)]: Balanced worker 1 has
name ajp13-sb
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (498)]: Balanced worker 2 has
name ajp13-02
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (502)]: in_local_worker_mode:
true
[Wed Jul 16 14:04:44 2003]  [jk_lb_worker.c (505)]:
local_worker_only: false

but after the switching and graceful restart exactly the same
(which is the
error) !!!!!!!!

[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (420)]: Into
jk_worker_t::validate
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 0 has
name ajp13-01
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 1 has
name ajp13-sb
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (498)]: Balanced worker 2 has
name ajp13-02
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (502)]: in_local_worker_mode:
true
[Wed Jul 16 14:11:16 2003]  [jk_lb_worker.c (505)]:
local_worker_only: false

This explains the observed (wrong) fall-over behavior, (should be
ajp13-02,
ajp13-sb, ajp13-01)


original workers.properties:
worker.ajp13-01.lbfactor=1
worker.ajp13-01.local_worker=1

worker.ajp13-02.lbfactor=1
worker.ajp13-02.local_worker=0

worker.ajp13-sb.lbfactor=0
worker.ajp13-sb.local_worker=1

local_worker_only=0 for the lb-worker

changed to before graceful restart: (linking a different
worker.properties)

worker.ajp13-01.lbfactor=1
worker.ajp13-01.local_worker=0

worker.ajp13-02.lbfactor=1
worker.ajp13-02.local_worker=1

worker.ajp13-sb.lbfactor=0
worker.ajp13-sb.local_worker=1

local_worker_only=0 the lb-worker

So it *seems* there might be something wrong with the reinitialisation of
the worker order ?

If you need further information, I can mail you the complete logs offline.

Thanks for looking into this,
Hans

Then post those log lines here.

Thanks,

Glenn

Hans Schmid wrote:

Hi,
I noticed the following with mod_jk 1.2.4, Apache 1.3.26 and
Tomcat 3.3.1a on Solaris 8 JDK 1.4.1_03.
Maybe a LB bug (Loadfactors do not recover after startup of new
tomcat/graceful Apache restart).
Let me explain my scenario first:

I want to gracefully upgrade our webapp without loosing

sessions + have a

fail over scenario.
Therefor we have sticky sessions enabled.

Setup:
1 tomcat 01 running on Server A,
0 tomcat 02 running on Server A,
1 tomcat SB running on Server B

01 tomcat on Server A runs the application, SB tomcat on server B is
standby(fallback),
02 tomcat is shutdown on Server A at the moment.

All three Tomcats are in the same lb_worker:

worker.list=lb-worker

worker.ajp13-01.port=11009
worker.ajp13-01.host=A
worker.ajp13-01.type=ajp13
worker.ajp13-01.lbfactor=1
worker.ajp13-01.local_worker=1

worker.ajp13-02.port=11019
worker.ajp13-02.host=A
worker.ajp13-02.type=ajp13
worker.ajp13-02.lbfactor=1
worker.ajp13-02.local_worker=0

worker.ajp13-sb.port=11015
worker.ajp13-sb.host=B
worker.ajp13-sb.type=ajp13
worker.ajp13-sb.lbfactor=0
worker.ajp13-sb.local_worker=1

worker.lb-worker.type=lb
worker.lb-worker.balanced_workers=ajp13-01, ajp13-02, ajp13-sb
worker.lb-worker.local_worker_only=0


The worker List order should now be:
1. worker.ajp13-01 lbfactor=1,local_worker=1  TC 01
2. worker.ajp13-sb lbfactor=0,local_worker=1  TC SB
3. worker.ajp13-02 lbfactor=1,local_worker=0) TC 02  (not running)

Now all requests go to worker.ajp13-01 (TC 01), none to

worker.ajp13-sb (TC

SB lbfactor=0),
none to worker.ajp13-02.port (TC 02 not running).

If Server A crashes (TC 01) all new requests go to Server B (TC SB
worker.ajp13-sb)
since this is then the only running Tomcat FINE
This is our Fail-Over Solution (lost running sessions, but OK).

Now the webapp update Scenario:

1.) shutdown TC SB on Server B, update the webapp, start tc SB

and test via

a seperate HTTPConnector port without Apache.
2.) this does not affect anything on production, since the

lbfactor=0 on TC

SB
-> no sessions arrive on tc SB
3.) When the test was successful, our Standby Tomcat SB is updated
4.) Now upgrade the webapp on Server A TC 02, which is currently not
running.
5.) Start up TC 02 on Server A with the new version of the webapp,
immediately exchange the worker.properties with a different

version and

gracefully restart apache:

worker.list=lb-worker

worker.ajp13-01.port=11009
worker.ajp13-01.host=A
worker.ajp13-01.type=ajp13
worker.ajp13-01.lbfactor=1
worker.ajp13-01.local_worker=0     <---- put old webapp on TC

01 to the

foreign worker list

worker.ajp13-02.port=11019
worker.ajp13-02.host=A
worker.ajp13-02.type=ajp13
worker.ajp13-02.lbfactor=1
worker.ajp13-02.local_worker=1     <---- put new webapp on TC

02 in front of

the local worker list

worker.ajp13-sb.port=11015
worker.ajp13-sb.host=B
worker.ajp13-sb.type=ajp13
worker.ajp13-sb.lbfactor=0
worker.ajp13-sb.local_worker=1

worker.lb-worker.type=lb
worker.lb-worker.balanced_workers=ajp13-01, ajp13-02, ajp13-sb
worker.lb-worker.local_worker_only=0

Just the two lines marked above with <---- swap
(local_worker values of TC 01 and TC 02)

6.) now all 3 Tomcats are running. All existing sessions still

go to TC 01

(sticky sessions; we do not loose running sessions)
7.) What I expect:
TC 02 takes a while to startup.
The worker List order should now be:
1. worker.ajp13-02 lbfactor=1,local_worker=1  TC 02
2. worker.ajp13-sb lbfactor=0,local_worker=1  TC SB
3. worker.ajp13-01 lbfactor=1,local_worker=0) TC 01  (old webapp)

Since TC 02 needs 3 minutes to start up (filling caches etc.)

it is not

immediately availlable.
During this time new sessions arrive at TC SB, since it is the

next in the

worker list. OK fine this works.
Since these sessions are sticky as well, all users connecting

during this

time stay on TC SB
during their whole session life. FINE
8.) As soon as TC 02 is up and running (finished all

load-on-startup servlet

initialisition stuff)
I would expect that TC 02 gets all new Sessions (Number 1 in

the worker

List).

This is not the case! All new Sessions still arrive at TC SB.
9.) After a while (one hour) we shutdown TC 01. Since no new sessions
arrived there since our
graceful restart of Apache, all old Sessions should have expired.
10.) even now (only 2 Tomcats running TC 02 and TC SB) and

even after a

graceful restart new Sessions
arrive at TC SB


Conclusion:
Now, do I misunderstand the supposed behaviour of lbfactor and

local_worker

flag ?
I think that the behaviour in 8.) is wrong. 10.) is starange too.

Thanks for any suggestion if I am completely wrong here
or further looking into this.

Hans

-----Ursprungliche Nachricht-----
Von: Glenn Nielsen [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 9. Juli 2003 15:56
An: Tomcat Developers List
Betreff: Re: jk 1.2.25 release ?

I was hoping to get it released this week.

But I just noticed that under Apache 2 mod_jk piped logs there
are two instances of the piped log program running for the same
log file.  I want to track this down.

I also just implemented load balancing this morning on a production
server.  I noticed that when none of the workers for the load balancer
were available an HTTP status code of 200 was being logged in

mod_jk.log

when request logging was enabled. So I want to look into this also.

Hopefully now that I have load balancing in place with 2

tomcat servers

instead of 1 the Missouri Lottery web site I administer will scale to
handle the big spike in load tonight for the $240 PowerBall

jackpot. :-)

Regards,

Glenn

Henri Gomez wrote:

Any date ?


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--
----------------------------------------------------------------------
Glenn Nielsen             [EMAIL PROTECTED] | /* Spelin donut madder    |
MOREnet System Programming               |  * if iz ina coment.      |
Missouri Research and Education Network  |  */                       |
----------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Solved AW: jk 1.2.4 LB bug?

Reply via email to