I have spent some more time on this today and I am still unable to recreate these results on either Linux or Windows using the Snake WebSocket example that ships with Tomcat.

To make progress I need a reproducible test case, ideally using the Tomcat examples from the latest release of any currently supported version. Failing that, the source code for the bare minimum web application required to reproduce the issue.

At this point, it is still not clear if this is a Tomcat or an application issue.

Mark


On 06/12/2023 07:34, Jakub Remenec wrote:
Hi,

I've experienced the same issue as described on Apache Tomcat 10.1.13.
After downgrading to 10.1.5 it started to work correctly. I also inspected
the heapdump of the application with memory problems and found out that
there were many org.apache.tomcat.websocket.WsSession present in
OUTPUT_CLOSED state. When I tried locally, I found out, that when I open
few websocket connections from Chrome and then go to Offline mode the
WsSessions remain in OUTPUT_CLOSED state. New connections afterwards have
state OPEN. In heapdump from production I saw around 4600 WsSessions but
only 40 were open. The rest was in the output closed state.
WsSessions are accessible through org.apache.coyote.AbstractProtocol ->
waitingProcessors Set. In heapdump and it was clearly visible that 49% of
heap was taken by the waitingProcessors set. When tomcat was downgraded to
10.1.5, I saw that WsSessions got cleared after going to offline mode.

Additional info - I've set the session timeout to 10minutes. The app uses
Java 17 with Spring Boot 3.1.x stack. It does not use any external STOMP
broker relay.

Regards,
Jakub.

On 2023/08/20 22:44:46 Mark Thomas wrote:
On 20/08/2023 05:21, Mark Thomas wrote:
On 18/08/2023 11:28, Rubén Pérez wrote:

<snip/>

I started experiencing exactly the same issue when updating from Spring
6.0.7 to 6.0.9, therefore updating tomcat from 10.1.5 to 10.1.8. The
Memory
leak is very clearly visible in my monitoring tools. A further heap
dump
reveals like many times more entries in waitingProcessors map than real
active connections, and we end up with like 8 retained GB in memory
full of
those entries.

I believe I have found a way to reproduce the issue locally. Open a
websocket session from a client in Chrome, go to dev-tools and switch
the
tab to offline mode, wait > 50secs, go and switch it back to No
Throttling.
Sometimes I get an error back to the client like:

a["ERROR\nmessage:AMQ229014\\c Did not receive data from
/192.168.0.1\\c12720
within the 50000ms connection TTL. The connection will now be
closed.\ncontent-length:0\n\n\u0000"]

And other times I get instead something like c[1002, ""] from Artemis
followed by an "Invalid frame header" error from Chrome (websockets
view in
dev-tools).

Only when it is the latter case, looks to be leaking things in that
map.
Maybe it is a casualty or not, but that is what I have observed at
least 2
times.

After the error appeared, I waited long enough for FE to reconnect the
session, and then I just quitted Chrome.

Thanks for the steps to reproduce. That is helpful. I'll let you know
how I get on.

Unfortunately, I didn't get very far. Based on the log messages it looks
very much like those are application generated rather than Tomcat
generated.

At this point I am wondering if this is an application or a Tomcat
issue. I'm going to need a sample application (ideally as cut down as
possible) that demonstrates the issue to make progress on this.

Another option is debugging this yourself to figure out what has
changed. I can provide some pointers if this is of interest. Giv en you
can repeat the issue reaosnable reliably, tracking down the commit that
trigger the change isn't too hard.

Again, after forcefully downgrading Tomcat 10.1.8 to 10.1.5 while
preserving the same Spring version, the issue is gone (confirmed in
production), in fact I have never managed to get an "Invalid frame
header"
in Chrome again with Tomcat 10.1.5 (in like 10 attempts). Before I got
it
in 2 out of 4 attempts.

Could you do some further testing and see if you can narrow down exactly

which version (10.1.6, 10.1.7 or 10.1.8) the issue first appears in?

It would also be helpful to confirm if the issue is still present in
10.1.12.

Answers to the above would still be helpful.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to