Greetings!

Our app is heavy on the RPC pattern using the Artemis JMS client API.  Our RPC 
client sends a request to a named service queue, and waits for a reply on a 
reply-to queue.  This mostly works great, but under load, and in the Kubernetes 
environment, something goes wrong and a message is lost.

I am beginning to suspect that that this is due to using a single reply-to 
temporary queue for a long lifetime.  It is OK to use a temporary queue for a 
long time?  If there is a connection error (like I think we see below), should 
I be creating a new reply queue?

Apologies in advance because I feel like we keep nibbling around the edge of 
the same symptoms and never quite getting enough information to completely 
resolve it. Hopefully this time we’ve logged enough detail and uncovered 
something new that will open the door to an answer.

We recently logged an issue that seems to come down to:

  *   There is an error logged at low levels of the AMQ client
  *   These errors seem to mostly self-recover but not before a 30-second 
timeout.
  *   Our RPC client sends a request message to a named service queue
  *   Our RPC service receives the request and sends a reply message to the 
(temporary) reply-to queue
  *   ActiveMQ broker log shows the reply SEND but not the DELIVER
  *   On examination of the AMQ console, the reply-to queue exists, has one 
message, but no consumers

On the face of it this seems simple.  We must be closing the reply consumer or 
session.  But we see evidence that something else is going wrong at a lower 
level.

I keep scouring our code for where we might be closing a session or consumer, 
and I just can’t find anything, because:

  *   We create Connections statically and never close them
  *   We use a singleton reply-to queue and MessageListener, and receive all 
replies through this one consumer
  *   Our singleton MessageListener creates and owns its Session, Consumer, and 
Queue, and never closes them

In the log of our RPC client:
2023-03-18T03:02:27.290 [Thread-1 (ActiveMQ-client-global-threads)] 
client.connectionFailureDetected:293 [] WARN - AMQ212037: Connection failure to 
dm-activemq-svc/10.0.202.122:61616 has been detected: AMQ219014: Timed out 
after waiting 30000 ms for response when sending packet 71 
[code=CONNECTION_TIMEDOUT]
2023-03-18T03:02:27.294 [Thread-1 (ActiveMQ-client-global-threads)] 
ClientSessionFactoryImpl.failoverOrReconnect:555 [] DEBUG - Failure captured on 
connectionID=c8929f70, performing failover or reconnection now 
ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT 
message=AMQ219014: Timed out after waiting 30000 ms for response when sending 
packet 71]
…
2023-03-18T03:02:27.466 [Thread-1901] JmsStaticConnectionPool.onException:78 [] 
ERROR - Receive error occurred. javax.jms.JMSException: 
ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT 
message=AMQ219014: Timed out after waiting 30000 ms for response when sending 
packet 71]
You can see more detail at: 
https://drive.google.com/file/d/1-ThbTMQAPAJwNUI4HIlmFKXf-yB9p59E/view?usp=share_link
I’ve got our code set to log a thread stack dump when this happens, so there is 
a lot of detail there.

The JmsStaticConnectionPool.onException() is our registered exception handler.  
We don’t try to recover or anything in this method, we just log the error.  
Should we be doing something here?  I’ve never been able to understand under 
what circumstances we need to actively recover from errors.

Shortly after that, we attempt another RPC call, which receives no reply:
2023-03-18T03:02:28.424 [HpaK8S] JmsRpcClientChannel.prepareCall:81 [] INFO - 
REQUEST OUT: { "header": 
{"version":1,"type":"void","id":"om8xv1gdjvsb","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"***REDACTED***"},
 "body":{"message_type":"void"} }
2023-03-18T03:02:28.427 [HpaK8S] AbstractProducerCreditsImpl.checkCredits:182 
[] DEBUG - not asking for -1 credits on rpdm.v1.hpa.metrics
2023-03-18T03:02:28.427 [HpaK8S] AbstractProducerCreditsImpl.afterAcquired:90 
[] DEBUG - AfterAcquire 679 credits on address rpdm.v1.hpa.metrics

You can see this is received in our service log and a reply is sent:
2023-03-18T03:02:30.292 [Thread-68 (ActiveMQ-client-global-threads)] 
JmsServerBase.logRequest:107 [] INFO - REQUEST IN: { "header": 
{"version":1,"type":"void","id":"om8xv1gdjvsb","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"***REDACTED***","chain":[]},
 "body":"***ERROR NO MESSAGE BODY***" }
2023-03-18T03:02:30.292 [Thread-68 (ActiveMQ-client-global-threads)] 
JmsServerBase.logResponse:114 [om8xv1gdjvsb] INFO - RESPONSE OUT: { "header": 
{"version":1,"type":"expected_pod_counts_response","id":"om8xv1gdjvsb","http_code":200},
 
"body":{"counts":[{"message_type":"expected_pod_count_response","pod_type":"project_execution","memory_mb":2048,"jvm_memory_mb":250,"count":1},{"message_type":"expected_pod_count_response","pod_type":"project_execution","memory_mb":6144,"jvm_memory_mb":2000,"count":0},{"message_type":"expected_pod_count_response","pod_type":"project_edit","memory_mb":500,"jvm_memory_mb":500,"count":1},{"message_type":"expected_pod_count_response","pod_type":"automation_edit","memory_mb":500,"jvm_memory_mb":500,"count":0},{"message_type":"expected_pod_count_response","pod_type":"automation_execution","memory_mb":500,"jvm_memory_mb":250,"count":0}],"message_type":"expected_pod_counts_response"}
 }

In the AMQ broker log the request SEND and DELIVER:
2023-03-17T21:02:28-06:00          2023-03-18 03:02:28,427 INFO  
[net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: { "header": 
{"version":1,"type":"void","id":"om8xv1gdjvsb","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"Bwuth0M2TKk1Znm+iD7+Pew2hp4Klq5+eTjGuWEjfn4="},
 "body":{"message_type":"void"} }
2023-03-17T21:02:30-06:00          2023-03-18 03:02:30,183 INFO  
[net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] DELIVER: { "header": 
{"version":1,"type":"void","id":"om8xv1gdjvsb","api":"hpa_metrics","method":"get_expected_pod_counts","authorization":"Bwuth0M2TKk1Znm+iD7+Pew2hp4Klq5+eTjGuWEjfn4="},
 "body":{"message_type":"void"} }

But the response only SEND, no DELIVER:
2023-03-17T21:02:30-06:00          2023-03-18 03:02:30,305 INFO  
[net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: { "header": 
{"version":1,"type":"expected_pod_counts_response","id":"om8xv1gdjvsb","http_code":200},
 
"body":{"counts":[{"message_type":"expected_pod_count_response","pod_type":"project_execution","memory_mb":2048,"jvm_memory_mb":250,"count":1},{"message_type":"expected_pod_count_response","pod_type":"project_execution","memory_mb":6144,"jvm_memory_mb":2000,"count":0},{"message_type":"expected_pod_count_response","pod_type":"project_edit","memory_mb":500,"jvm_memory_mb":500,"count":1},{"message_type":"expected_pod_count_response","pod_type":"automation_edit","memory_mb":500,"jvm_memory_mb":500,"count":0},{"message_type":"expected_pod_count_response","pod_type":"automation_execution","memory_mb":500,"jvm_memory_mb":250,"count":0}],"message_type":"expected_pod_counts_response"}
 }

This is consistent with the reply-to consumer being closed, but I cannot how or 
when it is closed.  Also in the AMQ console we see the consumer count is zero 
and there is one message, so it certainly acts like the reply-to consumer was 
closed.

I should mention that we see a lot of these in our RPC client log, but since 
there so many I assumed they were harmless:
2023-03-18T03:02:36.094 [Thread-2] ClientSessionImpl.close:1333 [] DEBUG - 
Calling close on session ClientSessionImpl 
[name=509b0e84-c539-11ed-911b-86945e7d6721, username=null, closed=false, 
factory = 
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl@50f3a373<mailto:org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl@50f3a373>,
 metaData=()]@d95cd6d
2023-03-18T03:02:36.095 [Thread-2] ClientSessionImpl.doCleanup:2080 [] DEBUG - 
calling cleanup on ClientSessionImpl 
[name=509b0e84-c539-11ed-911b-86945e7d6721, username=null, closed=false, 
factory = 
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl@50f3a373<mailto:org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl@50f3a373>,
 metaData=()]@d95cd6d

I looked back 30 seconds in the RPC client log, and didn’t see anything odd, 
but there is this unusual flurry:
2023-03-18T03:01:57.255 [HpaMetrics-Background] 
AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1 
credits on activemq.management
2023-03-18T03:01:57.255 [HpaMetrics-Background] 
AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 413 
credits on address activemq.management
2023-03-18T03:01:57.256 [HpaMetrics-Background] 
AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1 
credits on activemq.management
2023-03-18T03:01:57.256 [HpaMetrics-Background] 
AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 415 
credits on address activemq.management
2023-03-18T03:01:57.257 [HpaMetrics-Background] 
AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1 
credits on activemq.management
2023-03-18T03:01:57.257 [HpaMetrics-Background] 
AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 389 
credits on address activemq.management
2023-03-18T03:01:57.257 [HpaMetrics-Background] 
AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1 
credits on activemq.management
2023-03-18T03:01:57.257 [HpaMetrics-Background] 
AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 395 
credits on address activemq.management
2023-03-18T03:01:57.258 [HpaMetrics-Background] 
AbstractProducerCreditsImpl.checkCredits:182 [] DEBUG - not asking for -1 
credits on activemq.management
2023-03-18T03:01:57.258 [HpaMetrics-Background] 
AbstractProducerCreditsImpl.afterAcquired:90 [] DEBUG - AfterAcquire 401 
credits on address activemq.management

Meanwhile, we have a custom logger installed in the AMQ broker, and everything 
looks fine until it is not:



[rg] <https://www.redpointglobal.com/>

John Lilley

Data Management Chief Architect, Redpoint Global Inc.

888 Worcester Street, Suite 200 Wellesley, MA 02482

M: +1 7209385761<tel:+1%207209385761> | 
john.lil...@redpointglobal.com<mailto:john.lil...@redpointglobal.com>

PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is confidential 
and is intended solely for the use of the individual(s) to whom it is 
addressed. If you believe you received this e-mail in error, please notify the 
sender immediately, delete the e-mail from your computer and do not copy, print 
or disclose it to anyone else. If you properly received this e-mail as a 
customer, partner or vendor of Redpoint, you should maintain its contents in 
confidence subject to the terms and conditions of your agreement(s) with 
Redpoint.

Reply via email to