I think the fact that you're getting AMQ219014 explains most, if not all,
of the issues. The "packet 71" that is timing out here is the packet used
to send a message from the core client to the broker. As far as I can tell
this is your "lost" message although it's not really lost since you got a
clear exception indicating that sending the message failed. This message
will not be resent automatically for the same basic reasons I explained on
your other thread [1].


Justin

[1] https://lists.apache.org/thread/6q7zoczljslbr26v845fycz4lfdgs3nk

On Sun, Mar 19, 2023 at 4:03 PM John Lilley
<john.lil...@redpointglobal.com.invalid> wrote:

> Greetings!
>
>
>
> Our app is heavy on the RPC pattern using the Artemis JMS client API.  Our
> RPC client sends a request to a named service queue, and waits for a reply
> on a reply-to queue.  This mostly works great, but under load, and in the
> Kubernetes environment, something goes wrong and a message is lost.
>
>
>
> I am beginning to suspect that that this is due to using a single reply-to
> temporary queue for a long lifetime.  It is OK to use a temporary queue for
> a long time?  If there is a connection error (like I think we see below),
> should I be creating a new reply queue?
>
>
>
> Apologies in advance because I feel like we keep nibbling around the edge
> of the same symptoms and never quite getting enough information to
> completely resolve it. Hopefully this time we’ve logged enough detail and
> uncovered something new that will open the door to an answer.
>
>
>
> We recently logged an issue that seems to come down to:
>
>    - There is an error logged at low levels of the AMQ client
>    - These errors seem to mostly self-recover but not before a 30-second
>    timeout.
>    - Our RPC client sends a request message to a named service queue
>    - Our RPC service receives the request and sends a reply message to
>    the (temporary) reply-to queue
>    - ActiveMQ broker log shows the reply SEND but not the DELIVER
>    - On examination of the AMQ console, the reply-to queue exists, has
>    one message, but no consumers
>
>
>
> On the face of it this seems simple.  We must be closing the reply
> consumer or session.  But we see evidence that something else is going
> wrong at a lower level.
>
>
>
> I keep scouring our code for where we might be closing a session or
> consumer, and I just can’t find anything, because:
>
>    - We create Connections statically and never close them
>    - We use a singleton reply-to queue and MessageListener, and receive
>    all replies through this one consumer
>    - Our singleton MessageListener creates and owns its Session,
>    Consumer, and Queue, and never closes them
>
>
>
> In the log of our RPC client:
>
> 2023-03-18T03:02:27.290 [Thread-1 (ActiveMQ-client-global-threads)]
> client.connectionFailureDetected:293 [] WARN - AMQ212037: Connection
> failure to dm-activemq-svc/10.0.202.122:61616 has been detected:
> AMQ219014: Timed out after waiting 30000 ms for response when sending
> packet 71 [code=CONNECTION_TIMEDOUT]
>
> 2023-03-18T03:02:27.294 [Thread-1 (ActiveMQ-client-global-threads)]
> ClientSessionFactoryImpl.failoverOrReconnect:555 [] DEBUG - Failure
> captured on connectionID=c8929f70, performing failover or reconnection now
> ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT
> message=AMQ219014: Timed out after waiting 30000 ms for response when
> sending packet 71]
>
> …
>
> 2023-03-18T03:02:27.466 [Thread-1901]
> JmsStaticConnectionPool.onException:78 [] ERROR - Receive error occurred.
> javax.jms.JMSException:
> ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT
> message=AMQ219014: Timed out after waiting 30000 ms for response when
> sending packet 71]
>
> You can see more detail at:
> https://drive.google.com/file/d/1-ThbTMQAPAJwNUI4HIlmFKXf-yB9p59E/view?usp=share_link
>
> I’ve got our code set to log a thread stack dump when this happens, so
> there is a lot of detail there.
>
>
>
> The JmsStaticConnectionPool.onException() is our registered exception
> handler.  We don’t try to recover or anything in this method, we just log
> the error.  *Should we be doing something here?  *I’ve never been able to
> understand under what circumstances we need to actively recover from errors.
>
>
>
> Shortly after that, we attempt another RPC call, which receives no reply:
>
> 2023-03-18T03:02:28.424 [HpaK8S] JmsRpcClientChannel.prepareCall:81 []
> INFO - REQUEST OUT: { "header":
> {"version":1,"type":"void","id":"om8xv1gdjvsb","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"***REDACTED***"},
> "body":{"message_type":"void"} }
>
> 2023-03-18T03:02:28.427 [HpaK8S]
> AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1
> credits on rpdm.v1.hpa.metrics
>
> 2023-03-18T03:02:28.427 [HpaK8S]
> AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 679
> credits on address rpdm.v1.hpa.metrics
>
>
>
> You can see this is received in our service log and a reply is sent:
>
> 2023-03-18T03:02:30.292 [Thread-68 (ActiveMQ-client-global-threads)]
> JmsServerBase.logRequest:107 [] INFO - REQUEST IN: { "header":
> {"version":1,"type":"void","id":"om8xv1gdjvsb","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"***REDACTED***","chain":[]},
> "body":"***ERROR NO MESSAGE BODY***" }
>
> 2023-03-18T03:02:30.292 [Thread-68 (ActiveMQ-client-global-threads)]
> JmsServerBase.logResponse:114 [om8xv1gdjvsb] INFO - RESPONSE OUT: {
> "header":
> {"version":1,"type":"expected_pod_counts_response","id":"om8xv1gdjvsb","http_code":200},
> "body":{"counts":[{"message_type":"expected_pod_count_response","pod_type":"project_execution","memory_mb":2048,"jvm_memory_mb":250,"count":1},{"message_type":"expected_pod_count_response","pod_type":"project_execution","memory_mb":6144,"jvm_memory_mb":2000,"count":0},{"message_type":"expected_pod_count_response","pod_type":"project_edit","memory_mb":500,"jvm_memory_mb":500,"count":1},{"message_type":"expected_pod_count_response","pod_type":"automation_edit","memory_mb":500,"jvm_memory_mb":500,"count":0},{"message_type":"expected_pod_count_response","pod_type":"automation_execution","memory_mb":500,"jvm_memory_mb":250,"count":0}],"message_type":"expected_pod_counts_response"}
> }
>
>
>
> In the AMQ broker log the request SEND and DELIVER:
>
> 2023-03-17T21:02:28-06:00          2023-03-18 03:02:28,427 INFO
> [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: { "header":
> {"version":1,"type":"void","id":"om8xv1gdjvsb","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"Bwuth0M2TKk1Znm+iD7+Pew2hp4Klq5+eTjGuWEjfn4="},
> "body":{"message_type":"void"} }
>
> 2023-03-17T21:02:30-06:00          2023-03-18 03:02:30,183 INFO
> [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] DELIVER: { "header":
> {"version":1,"type":"void","id":"om8xv1gdjvsb","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"Bwuth0M2TKk1Znm+iD7+Pew2hp4Klq5+eTjGuWEjfn4="},
> "body":{"message_type":"void"} }
>
>
>
> But the response only SEND, no DELIVER:
>
> 2023-03-17T21:02:30-06:00          2023-03-18 03:02:30,305 INFO
> [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: { "header":
> {"version":1,"type":"expected_pod_counts_response","id":"om8xv1gdjvsb","http_code":200},
> "body":{"counts":[{"message_type":"expected_pod_count_response","pod_type":"project_execution","memory_mb":2048,"jvm_memory_mb":250,"count":1},{"message_type":"expected_pod_count_response","pod_type":"project_execution","memory_mb":6144,"jvm_memory_mb":2000,"count":0},{"message_type":"expected_pod_count_response","pod_type":"project_edit","memory_mb":500,"jvm_memory_mb":500,"count":1},{"message_type":"expected_pod_count_response","pod_type":"automation_edit","memory_mb":500,"jvm_memory_mb":500,"count":0},{"message_type":"expected_pod_count_response","pod_type":"automation_execution","memory_mb":500,"jvm_memory_mb":250,"count":0}],"message_type":"expected_pod_counts_response"}
> }
>
>
>
> This is consistent with the reply-to consumer being closed, but I cannot
> how or when it is closed.  Also in the AMQ console we see the consumer
> count is zero and there is one message, so it certainly *acts* like the
> reply-to consumer was closed.
>
>
>
> I should mention that we see a lot of these in our RPC client log, but
> since there so many I assumed they were harmless:
>
> 2023-03-18T03:02:36.094 [Thread-2] ClientSessionImpl.close:1333 [] DEBUG -
> Calling close on session ClientSessionImpl
> [name=509b0e84-c539-11ed-911b-86945e7d6721, username=null, closed=false,
> factory =
> org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl@50f3a373,
> metaData=()]@d95cd6d
>
> 2023-03-18T03:02:36.095 [Thread-2] ClientSessionImpl.doCleanup:2080 []
> DEBUG - calling cleanup on ClientSessionImpl
> [name=509b0e84-c539-11ed-911b-86945e7d6721, username=null, closed=false,
> factory =
> org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl@50f3a373,
> metaData=()]@d95cd6d
>
>
>
> I looked back 30 seconds in the RPC client log, and didn’t see anything
> odd, but there is this unusual flurry:
>
> 2023-03-18T03:01:57.255 [HpaMetrics-Background]
> AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1
> credits on activemq.management
>
> 2023-03-18T03:01:57.255 [HpaMetrics-Background]
> AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 413
> credits on address activemq.management
>
> 2023-03-18T03:01:57.256 [HpaMetrics-Background]
> AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1
> credits on activemq.management
>
> 2023-03-18T03:01:57.256 [HpaMetrics-Background]
> AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 415
> credits on address activemq.management
>
> 2023-03-18T03:01:57.257 [HpaMetrics-Background]
> AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1
> credits on activemq.management
>
> 2023-03-18T03:01:57.257 [HpaMetrics-Background]
> AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 389
> credits on address activemq.management
>
> 2023-03-18T03:01:57.257 [HpaMetrics-Background]
> AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1
> credits on activemq.management
>
> 2023-03-18T03:01:57.257 [HpaMetrics-Background]
> AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 395
> credits on address activemq.management
>
> 2023-03-18T03:01:57.258 [HpaMetrics-Background]
> AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1
> credits on activemq.management
>
> 2023-03-18T03:01:57.258 [HpaMetrics-Background]
> AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 401
> credits on address activemq.management
>
>
>
> Meanwhile, we have a custom logger installed in the AMQ broker, and
> everything looks fine until it is not:
>
>
>
> [image: rg] <https://www.redpointglobal.com/>
>
> John Lilley
>
> Data Management Chief Architect, Redpoint Global Inc.
>
> 888 Worcester Street, Suite 200 Wellesley, MA 02482
>
> *M: *+1 7209385761 <+1%207209385761> | john.lil...@redpointglobal.com
>
> PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is
> confidential and is intended solely for the use of the individual(s) to
> whom it is addressed. If you believe you received this e-mail in error,
> please notify the sender immediately, delete the e-mail from your computer
> and do not copy, print or disclose it to anyone else. If you properly
> received this e-mail as a customer, partner or vendor of Redpoint, you
> should maintain its contents in confidence subject to the terms and
> conditions of your agreement(s) with Redpoint.
>

Reply via email to