On 16/04/2020 18:00, Gordon Sim wrote:
On 16/04/2020 12:24 pm, Toralf Lund wrote:
Hi,

This is a problem that has been discussed before, but I kind of reached a dead end in trying to figure out what was going on then, so I'll try again:

On one particular site where one of our QPid C++ messaging applications is used, qpid-stat on the server reports multiple connections where there is supposed to be just one - the application has exactly one Connection object with one Session. Example output from "qpid-stat -c"; names are changed and entries for other processes are removed:

Connections
   connection                                   cproc cpid mech auth       connected        idle msgIn  msgOut =============================================================================================================================================    qpid.10.0.1.215:5672-10.0.2.61:37488   the-application 171170 ANONYMOUS  anonymous  5d 21h 43m 13s   5d 16h 34m 7s 778k  1.61m    qpid.10.0.1.215:5672-10.0.2.61:37492   the-application 171170 ANONYMOUS  anonymous  5d 16h 34m 20s   5d 13h 45m 27s 368k 940k    qpid.10.0.1.215:5672-10.0.2.61:37496   the-application 171170 ANONYMOUS  anonymous  5d 13h 45m 35s   5d 8h 36m 7s 637k  1.53m    qpid.10.0.1.215:5672-10.0.2.61:37498   the-application 171170 ANONYMOUS  anonymous  5d 8h 36m 14s    5d 6h 24m 47s 269k   794k    qpid.10.0.1.215:5672-10.0.2.61:37502   the-application 171170 ANONYMOUS  anonymous  5d 6h 24m 57s    5d 4h 25m 17s 248k   761k    qpid.10.0.1.215:5672-10.0.2.61:37504   the-application 171170 ANONYMOUS  anonymous  5d 4h 25m 22s    4d 22h 55m 47s 678k 1.70m    qpid.10.0.1.215:5672-10.0.2.61:37508   the-application 171170 ANONYMOUS  anonymous  4d 22h 56m 4s    4d 18h 50m 7s 514k  1.42m    qpid.10.0.1.215:5672-10.0.2.61:37510   the-application 171170 ANONYMOUS  anonymous  4d 18h 50m 16s   4d 14h 39m 17s 525k 1.43m    qpid.10.0.1.215:5672-10.0.2.61:37516   the-application 171170 ANONYMOUS  anonymous  4d 11h 57m 23s   4d 2h 52m 47s 2.61m 1.51m    qpid.10.0.1.215:5672-10.0.2.61:37530   the-application 171170 ANONYMOUS  anonymous  3d 21h 20m 43s   3d 11h 41m 7s 2.90m 1.52m    qpid.10.0.1.215:5672-10.0.2.61:37542   the-application 171170 ANONYMOUS  anonymous  3d 6h 55m 24s    2d 22h 29m 27s 2.40m 1.46m    qpid.10.0.1.215:5672-10.0.2.61:37550   the-application 171170 ANONYMOUS  anonymous  2d 16h 42m 4s    2d 8h 7m 37s 2.41m  1.51m    qpid.10.0.1.215:5672-10.0.2.61:37562   the-application 171170 ANONYMOUS  anonymous  2d 1h 13m 47s    1d 14h 28m 7s 3.01m 1.81m    qpid.10.0.1.215:5672-10.0.2.61:37570   the-application 171170 ANONYMOUS  anonymous  1d 10h 42m 8s    1d 1h 54m 47s 2.45m 1.53m    qpid.10.0.1.215:5672-10.0.2.61:37586   the-application 171170 ANONYMOUS  anonymous  9h 8m 51s        0s 2.62m  1.49m

What do you see for qpid-stat -s? i.e. how many sessions are there?
qpid-stat doesn't seem to have an option "-s" in the version used.

It turns out that addition of new connections coincide with log message like

2020-04-15 09:10:25 [Client] info Trying to connect to cirm...
2020-04-15 09:10:25 [System] info Connecting: 10.0.1.215:5672
2020-04-15 09:10:25 [Client] info Connection [10.0.2.61:37576-cirm:5672] connected to tcp:cirm:5672
2020-04-15 09:10:25 [Client] info Connected to cirm
2020-04-15 09:10:25 [Client] error session-busy: Session detached by peer

This is from "info+" logging by the QPid library; date and time is added by a log handler in the application. This sequence of events is triggered by Sender::send() or Receiver::receive(), i.e. we're looking at an automatic reconnect.

Are there errors showing disconnection that triggers the reconnect? Any indication of what exactly the error was that triggered it? Are there any heartbeat related errors on either client or server for example?

On the client side, the only qpid log messages I can find are the ones shown above. The server log has a number of entries like

Channel exception: session-busy: Session already attached: anonymous.959cc402-7ba5-4865-830a-f850a57356f7

which are typically followed by

Connection ... timed out: closing


The session-busy error happens when the session name is already in use on the server side (this is an unpleasant part of amqp 0-10 imo). That *can* sometimes happen on a legitimate reconnect, if the client manages to reconnect *before* the server has processed the loss of the connection.

So again, I'm wondering what's going on. I mean, if the session is "detached", why is a connection still listed on the server side?

The session being detached will not in itself affect the connection.

OK.

I have to admit the whole "session" concept is a bit fuzzy to me in this context.


Also, surely there can't be communication issues at this stage, as then the entire reconnect operation would have failed? Or am I missing something?

And more importantly: Is there anything I can do to make the "stale" connections go away? My strategy for recovering from the error is simply

qpid::Messaging::Session session;
qpid::Messaging::Connection connection;

....

session.close();
session=connection.createSession();

But evidently, that's not enough.

The connections disappear if I restart the application, so I suspect connection.close() ; connection.open(); would help, but if that's the case, why? How is what it does different from closing the one and only session?

Closing the last session will not automatically close the connection.

So again, what exactly does Connection::close(), or alternatively the Connection destructor, or something else that happens on exit, do to make the connections go away? It seems quite clear that *all* connections are removed, not just the one from the last reconnect.

However creating a new session on the same connection should not then cause a new connection to be opened.

So other than the original open(), there is nowhere in your application that a new connection would be created other the reconnect logic?

I'm quite sure there isn't.

Also, I don't ever see extra connections unless there is a reconnect message as indicated.

Are you using the built in reconnect or something custom?

I'm using the built-in reconnect.

In some past versions, the automatic option was disabled, and reconnects were taken care of "by hand" via

if(!connection.isOpen()) {
    Util::debug(1, "Open AMQP connection...");

    try {
      connection.open();

We had exactly the same problem then.

- Toralf


Also, if I do have to close the connection sometimes, how do I know exactly when? Obviously, it's possible to set up everything from scratch on every session or communication error, but is that supposed to be necessary? I've been assuming it isn't, and the the whole application is kind of structured around that.

(I would do more tests, if it wasn't for the fact that I only see the problem on a production system, which also found at a remote location with a very slow link to the rest of the world...)


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org

Reply via email to