On 16/04/2020 18:00, Gordon Sim wrote:
On 16/04/2020 12:24 pm, Toralf Lund wrote:
Hi,
This is a problem that has been discussed before, but I kind of
reached a dead end in trying to figure out what was going on then, so
I'll try again:
On one particular site where one of our QPid C++ messaging
applications is used, qpid-stat on the server reports multiple
connections where there is supposed to be just one - the application
has exactly one Connection object with one Session. Example output
from "qpid-stat -c"; names are changed and entries for other
processes are removed:
Connections
connection cproc cpid mech
auth connected idle msgIn msgOut
=============================================================================================================================================
qpid.10.0.1.215:5672-10.0.2.61:37488 the-application 171170
ANONYMOUS anonymous 5d 21h 43m 13s 5d 16h 34m 7s 778k 1.61m
qpid.10.0.1.215:5672-10.0.2.61:37492 the-application 171170
ANONYMOUS anonymous 5d 16h 34m 20s 5d 13h 45m 27s 368k 940k
qpid.10.0.1.215:5672-10.0.2.61:37496 the-application 171170
ANONYMOUS anonymous 5d 13h 45m 35s 5d 8h 36m 7s 637k 1.53m
qpid.10.0.1.215:5672-10.0.2.61:37498 the-application 171170
ANONYMOUS anonymous 5d 8h 36m 14s 5d 6h 24m 47s 269k 794k
qpid.10.0.1.215:5672-10.0.2.61:37502 the-application 171170
ANONYMOUS anonymous 5d 6h 24m 57s 5d 4h 25m 17s 248k 761k
qpid.10.0.1.215:5672-10.0.2.61:37504 the-application 171170
ANONYMOUS anonymous 5d 4h 25m 22s 4d 22h 55m 47s 678k 1.70m
qpid.10.0.1.215:5672-10.0.2.61:37508 the-application 171170
ANONYMOUS anonymous 4d 22h 56m 4s 4d 18h 50m 7s 514k 1.42m
qpid.10.0.1.215:5672-10.0.2.61:37510 the-application 171170
ANONYMOUS anonymous 4d 18h 50m 16s 4d 14h 39m 17s 525k 1.43m
qpid.10.0.1.215:5672-10.0.2.61:37516 the-application 171170
ANONYMOUS anonymous 4d 11h 57m 23s 4d 2h 52m 47s 2.61m 1.51m
qpid.10.0.1.215:5672-10.0.2.61:37530 the-application 171170
ANONYMOUS anonymous 3d 21h 20m 43s 3d 11h 41m 7s 2.90m 1.52m
qpid.10.0.1.215:5672-10.0.2.61:37542 the-application 171170
ANONYMOUS anonymous 3d 6h 55m 24s 2d 22h 29m 27s 2.40m 1.46m
qpid.10.0.1.215:5672-10.0.2.61:37550 the-application 171170
ANONYMOUS anonymous 2d 16h 42m 4s 2d 8h 7m 37s 2.41m 1.51m
qpid.10.0.1.215:5672-10.0.2.61:37562 the-application 171170
ANONYMOUS anonymous 2d 1h 13m 47s 1d 14h 28m 7s 3.01m 1.81m
qpid.10.0.1.215:5672-10.0.2.61:37570 the-application 171170
ANONYMOUS anonymous 1d 10h 42m 8s 1d 1h 54m 47s 2.45m 1.53m
qpid.10.0.1.215:5672-10.0.2.61:37586 the-application 171170
ANONYMOUS anonymous 9h 8m 51s 0s 2.62m 1.49m
What do you see for qpid-stat -s? i.e. how many sessions are there?
qpid-stat doesn't seem to have an option "-s" in the version used.
It turns out that addition of new connections coincide with log
message like
2020-04-15 09:10:25 [Client] info Trying to connect to cirm...
2020-04-15 09:10:25 [System] info Connecting: 10.0.1.215:5672
2020-04-15 09:10:25 [Client] info Connection
[10.0.2.61:37576-cirm:5672] connected to tcp:cirm:5672
2020-04-15 09:10:25 [Client] info Connected to cirm
2020-04-15 09:10:25 [Client] error session-busy: Session detached by
peer
This is from "info+" logging by the QPid library; date and time is
added by a log handler in the application. This sequence of events is
triggered by Sender::send() or Receiver::receive(), i.e. we're
looking at an automatic reconnect.
Are there errors showing disconnection that triggers the reconnect?
Any indication of what exactly the error was that triggered it? Are
there any heartbeat related errors on either client or server for
example?
On the client side, the only qpid log messages I can find are the ones
shown above. The server log has a number of entries like
Channel exception: session-busy: Session already attached:
anonymous.959cc402-7ba5-4865-830a-f850a57356f7
which are typically followed by
Connection ... timed out: closing
The session-busy error happens when the session name is already in use
on the server side (this is an unpleasant part of amqp 0-10 imo). That
*can* sometimes happen on a legitimate reconnect, if the client
manages to reconnect *before* the server has processed the loss of the
connection.
So again, I'm wondering what's going on. I mean, if the session is
"detached", why is a connection still listed on the server side?
The session being detached will not in itself affect the connection.
OK.
I have to admit the whole "session" concept is a bit fuzzy to me in this
context.
Also, surely there can't be communication issues at this stage, as
then the entire reconnect operation would have failed? Or am I
missing something?
And more importantly: Is there anything I can do to make the "stale"
connections go away? My strategy for recovering from the error is simply
qpid::Messaging::Session session;
qpid::Messaging::Connection connection;
....
session.close();
session=connection.createSession();
But evidently, that's not enough.
The connections disappear if I restart the application, so I suspect
connection.close() ; connection.open(); would help, but if that's the
case, why? How is what it does different from closing the one and
only session?
Closing the last session will not automatically close the connection.
So again, what exactly does Connection::close(), or alternatively the
Connection destructor, or something else that happens on exit, do to
make the connections go away? It seems quite clear that *all*
connections are removed, not just the one from the last reconnect.
However creating a new session on the same connection should not then
cause a new connection to be opened.
So other than the original open(), there is nowhere in your
application that a new connection would be created other the reconnect
logic?
I'm quite sure there isn't.
Also, I don't ever see extra connections unless there is a reconnect
message as indicated.
Are you using the built in reconnect or something custom?
I'm using the built-in reconnect.
In some past versions, the automatic option was disabled, and reconnects
were taken care of "by hand" via
if(!connection.isOpen()) {
Util::debug(1, "Open AMQP connection...");
try {
connection.open();
We had exactly the same problem then.
- Toralf
Also, if I do have to close the connection sometimes, how do I know
exactly when? Obviously, it's possible to set up everything from
scratch on every session or communication error, but is that supposed
to be necessary? I've been assuming it isn't, and the the whole
application is kind of structured around that.
(I would do more tests, if it wasn't for the fact that I only see the
problem on a production system, which also found at a remote location
with a very slow link to the rest of the world...)
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org