Greetings! Our app is heavy on the RPC pattern using the Artemis JMS client API. Our RPC client sends a request to a named service queue, and waits for a reply on a reply-to queue. This mostly works great, but under load, and in the Kubernetes environment, something goes wrong and a message is lost.
I am beginning to suspect that that this is due to using a single reply-to temporary queue for a long lifetime. It is OK to use a temporary queue for a long time? If there is a connection error (like I think we see below), should I be creating a new reply queue? Apologies in advance because I feel like we keep nibbling around the edge of the same symptoms and never quite getting enough information to completely resolve it. Hopefully this time we’ve logged enough detail and uncovered something new that will open the door to an answer. We recently logged an issue that seems to come down to: * There is an error logged at low levels of the AMQ client * These errors seem to mostly self-recover but not before a 30-second timeout. * Our RPC client sends a request message to a named service queue * Our RPC service receives the request and sends a reply message to the (temporary) reply-to queue * ActiveMQ broker log shows the reply SEND but not the DELIVER * On examination of the AMQ console, the reply-to queue exists, has one message, but no consumers On the face of it this seems simple. We must be closing the reply consumer or session. But we see evidence that something else is going wrong at a lower level. I keep scouring our code for where we might be closing a session or consumer, and I just can’t find anything, because: * We create Connections statically and never close them * We use a singleton reply-to queue and MessageListener, and receive all replies through this one consumer * Our singleton MessageListener creates and owns its Session, Consumer, and Queue, and never closes them In the log of our RPC client: 2023-03-18T03:02:27.290 [Thread-1 (ActiveMQ-client-global-threads)] client.connectionFailureDetected:293 [] WARN - AMQ212037: Connection failure to dm-activemq-svc/10.0.202.122:61616 has been detected: AMQ219014: Timed out after waiting 30000 ms for response when sending packet 71 [code=CONNECTION_TIMEDOUT] 2023-03-18T03:02:27.294 [Thread-1 (ActiveMQ-client-global-threads)] ClientSessionFactoryImpl.failoverOrReconnect:555 [] DEBUG - Failure captured on connectionID=c8929f70, performing failover or reconnection now ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ219014: Timed out after waiting 30000 ms for response when sending packet 71] … 2023-03-18T03:02:27.466 [Thread-1901] JmsStaticConnectionPool.onException:78 [] ERROR - Receive error occurred. javax.jms.JMSException: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ219014: Timed out after waiting 30000 ms for response when sending packet 71] You can see more detail at: https://drive.google.com/file/d/1-ThbTMQAPAJwNUI4HIlmFKXf-yB9p59E/view?usp=share_link I’ve got our code set to log a thread stack dump when this happens, so there is a lot of detail there. The JmsStaticConnectionPool.onException() is our registered exception handler. We don’t try to recover or anything in this method, we just log the error. Should we be doing something here? I’ve never been able to understand under what circumstances we need to actively recover from errors. Shortly after that, we attempt another RPC call, which receives no reply: 2023-03-18T03:02:28.424 [HpaK8S] JmsRpcClientChannel.prepareCall:81 [] INFO - REQUEST OUT: { "header": {"version":1,"type":"void","id":"om8xv1gdjvsb","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"***REDACTED***"}, "body":{"message_type":"void"} } 2023-03-18T03:02:28.427 [HpaK8S] AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1 credits on rpdm.v1.hpa.metrics 2023-03-18T03:02:28.427 [HpaK8S] AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 679 credits on address rpdm.v1.hpa.metrics You can see this is received in our service log and a reply is sent: 2023-03-18T03:02:30.292 [Thread-68 (ActiveMQ-client-global-threads)] JmsServerBase.logRequest:107 [] INFO - REQUEST IN: { "header": {"version":1,"type":"void","id":"om8xv1gdjvsb","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"***REDACTED***","chain":[]}, "body":"***ERROR NO MESSAGE BODY***" } 2023-03-18T03:02:30.292 [Thread-68 (ActiveMQ-client-global-threads)] JmsServerBase.logResponse:114 [om8xv1gdjvsb] INFO - RESPONSE OUT: { "header": {"version":1,"type":"expected_pod_counts_response","id":"om8xv1gdjvsb","http_code":200}, "body":{"counts":[{"message_type":"expected_pod_count_response","pod_type":"project_execution","memory_mb":2048,"jvm_memory_mb":250,"count":1},{"message_type":"expected_pod_count_response","pod_type":"project_execution","memory_mb":6144,"jvm_memory_mb":2000,"count":0},{"message_type":"expected_pod_count_response","pod_type":"project_edit","memory_mb":500,"jvm_memory_mb":500,"count":1},{"message_type":"expected_pod_count_response","pod_type":"automation_edit","memory_mb":500,"jvm_memory_mb":500,"count":0},{"message_type":"expected_pod_count_response","pod_type":"automation_execution","memory_mb":500,"jvm_memory_mb":250,"count":0}],"message_type":"expected_pod_counts_response"} } In the AMQ broker log the request SEND and DELIVER: 2023-03-17T21:02:28-06:00 2023-03-18 03:02:28,427 INFO [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: { "header": {"version":1,"type":"void","id":"om8xv1gdjvsb","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"Bwuth0M2TKk1Znm+iD7+Pew2hp4Klq5+eTjGuWEjfn4="}, "body":{"message_type":"void"} } 2023-03-17T21:02:30-06:00 2023-03-18 03:02:30,183 INFO [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] DELIVER: { "header": {"version":1,"type":"void","id":"om8xv1gdjvsb","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"Bwuth0M2TKk1Znm+iD7+Pew2hp4Klq5+eTjGuWEjfn4="}, "body":{"message_type":"void"} } But the response only SEND, no DELIVER: 2023-03-17T21:02:30-06:00 2023-03-18 03:02:30,305 INFO [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: { "header": {"version":1,"type":"expected_pod_counts_response","id":"om8xv1gdjvsb","http_code":200}, "body":{"counts":[{"message_type":"expected_pod_count_response","pod_type":"project_execution","memory_mb":2048,"jvm_memory_mb":250,"count":1},{"message_type":"expected_pod_count_response","pod_type":"project_execution","memory_mb":6144,"jvm_memory_mb":2000,"count":0},{"message_type":"expected_pod_count_response","pod_type":"project_edit","memory_mb":500,"jvm_memory_mb":500,"count":1},{"message_type":"expected_pod_count_response","pod_type":"automation_edit","memory_mb":500,"jvm_memory_mb":500,"count":0},{"message_type":"expected_pod_count_response","pod_type":"automation_execution","memory_mb":500,"jvm_memory_mb":250,"count":0}],"message_type":"expected_pod_counts_response"} } This is consistent with the reply-to consumer being closed, but I cannot how or when it is closed. Also in the AMQ console we see the consumer count is zero and there is one message, so it certainly acts like the reply-to consumer was closed. I should mention that we see a lot of these in our RPC client log, but since there so many I assumed they were harmless: 2023-03-18T03:02:36.094 [Thread-2] ClientSessionImpl.close:1333 [] DEBUG - Calling close on session ClientSessionImpl [name=509b0e84-c539-11ed-911b-86945e7d6721, username=null, closed=false, factory = org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl@50f3a373<mailto:org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl@50f3a373>, metaData=()]@d95cd6d 2023-03-18T03:02:36.095 [Thread-2] ClientSessionImpl.doCleanup:2080 [] DEBUG - calling cleanup on ClientSessionImpl [name=509b0e84-c539-11ed-911b-86945e7d6721, username=null, closed=false, factory = org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl@50f3a373<mailto:org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl@50f3a373>, metaData=()]@d95cd6d I looked back 30 seconds in the RPC client log, and didn’t see anything odd, but there is this unusual flurry: 2023-03-18T03:01:57.255 [HpaMetrics-Background] AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1 credits on activemq.management 2023-03-18T03:01:57.255 [HpaMetrics-Background] AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 413 credits on address activemq.management 2023-03-18T03:01:57.256 [HpaMetrics-Background] AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1 credits on activemq.management 2023-03-18T03:01:57.256 [HpaMetrics-Background] AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 415 credits on address activemq.management 2023-03-18T03:01:57.257 [HpaMetrics-Background] AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1 credits on activemq.management 2023-03-18T03:01:57.257 [HpaMetrics-Background] AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 389 credits on address activemq.management 2023-03-18T03:01:57.257 [HpaMetrics-Background] AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1 credits on activemq.management 2023-03-18T03:01:57.257 [HpaMetrics-Background] AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 395 credits on address activemq.management 2023-03-18T03:01:57.258 [HpaMetrics-Background] AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1 credits on activemq.management 2023-03-18T03:01:57.258 [HpaMetrics-Background] AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 401 credits on address activemq.management Meanwhile, we have a custom logger installed in the AMQ broker, and everything looks fine until it is not: [rg] <https://www.redpointglobal.com/> John Lilley Data Management Chief Architect, Redpoint Global Inc. 888 Worcester Street, Suite 200 Wellesley, MA 02482 M: +1 7209385761<tel:+1%207209385761> | john.lil...@redpointglobal.com<mailto:john.lil...@redpointglobal.com> PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is confidential and is intended solely for the use of the individual(s) to whom it is addressed. If you believe you received this e-mail in error, please notify the sender immediately, delete the e-mail from your computer and do not copy, print or disclose it to anyone else. If you properly received this e-mail as a customer, partner or vendor of Redpoint, you should maintain its contents in confidence subject to the terms and conditions of your agreement(s) with Redpoint.