Hi all,

we are currently observing a really bizarre problem on a customer system.
Our software runs a number of microservices on individual Tomcats, which we 
front with an Apache HTTPD (2.4.x) reverse proxy using mod_jk to route the 
requests by context. There is one exception, though: one of the microservices 
which we added to the stack at a later point in time uses websocksets, which 
are not supported through the AJP protocol, so we are using mod_proxy_balancer 
here.
We put the ProxyPass etc. rules for mod_proxy_balancer in front of the 
directives related to mod_jk and we have been mostly fine with this approach 
for a few years now. We have two sets of balancer specifications for 
mod_proxy_balancer and their associated rules, one for regular http traffic, 
the other for websocket traffic ("ws:" resp. "wss:").

Let's name the microservices that are handled by mod_jk A, B, and C,  and let's 
name the one handled by mod_proxy_balancer Z. Let's further assume that their 
request contexts are /a, /b, /c and /z, respectively.

Now about the current customer problem: the customer started experiencing very 
erratic system behaviour. In particular requests that were meant for one of the 
microservices A-C handled by mod_jk would randomly give 404 responses. Usually, 
this situation would persist for an affected user for a few seconds and 
reloading wouldn't resolve it. At the same time, other users accessing the very 
same microservice didn't have a problem. Pretty much all users were affected 
from time to time.

We did several troubleshooting sessions that turned up nothing. At some point, 
we started to monitor all kinds of traffic between HTTPD and the Tomcats with 
TCPdump, and here we found the bizarre thing:
When we ran TCP dump and filtered it to only show traffic between HTTPD and the 
microservice Z (handled by mod_proxy_balancer), we sometimes saw requests that 
were clearly meant for one of the OTHER microservices (A-C) based on the 
request URL (a, /b, /c) that would show up in the traffic to the microservice 
Z, and naturally microservice Z has no idea of what to do with these requests 
and responds with 404.

What else might be relevant:
- our microservices are stateless, so we an scale horizontally if we want. On 
that particular system, we have at least two instances of each microservice 
(A-C and Z)
- the installation is spread across multiple nodes
- the nodes run on Linux
- Docker is not used ;-)
- we have never seen this problem on any other system
- we haven't seen this problem on the customer's test system, but here usage 
patterns are different
- the requests with 404 responses wouldn't show up in the HTTPD's access log 
(where "normal" 404 requests DO show).
- the customer had recently updated from a version of our product that uses 
Apache 2.4.34 to one using 2.4.41
- disabling the microservice Z (= no more balancer workers for 
mod_proxy_balancer) would resolve the problem
- putting the rules for mod_proxy_balancer after those of mod_jk (and adding an 
exclusion for /z there, cause on of the other microservices is actually 
listening on the root context) would NOT change a thing

From experience, we are pretty sure that the problem is somewhere on our side. 
;-)

- One thing we thought is that maybe a bug in microservice Z that is only 
triggered by this customer's use of our product causes the erratic behaviour of 
the HTTPD/MPB? Maybe something we do wrong messing up the connection keepalive 
between Apache and Tomcat, causing requests to go the wrong way?
- Or maybe it is related to the Apache version update (2.4.34 to 2.4.41)? But 
why are other installations with the same version not affected?

Any ideas where we should start looking?

Regards

J




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org

Reply via email to