> Is it expected that we’d hit this error during failover... It's not necessarily surprising that you'd hit that error, although generally when a broker fails the connection is terminated completely rather than the current operation timing out. The timeout is ultimately ambiguous. You can't reliably conclude that the broker has failed due to a timeout like this. It could be the result of a network issue or a broker slow-down for some reason (e.g. long GC pause). The broker may have received what you sent but simply failed to send a response back within the timeout or it may not have received anything. You can retry the operation, but if you're sending a message that may result in a duplicate, although that's why we have duplicate detection [1].
> ...and that it takes this long to manifest? The default callTimeout is 30,000 milliseconds so your observation fits. > Is there a way to reduce the timeout value, and is that recommended? You can pass "callTimeout=X" on your connection URL, e.g.: tcp://host:61616?callTimeout=10000. The general recommendation is to use the default (for simplicity's sake) and adjust as necessary for your use-case. Lowering the timeout means you will detect such issues sooner, but also that you will be more susceptible to timeouts in the event of a broker slow-down. It's a balancing act. > Are there yet more codes we should retry? I can't think of any additional codes. Justin [1] https://activemq.apache.org/components/artemis/documentation/latest/duplicate-detection.html#duplicate-message-detection On Fri, Jan 19, 2024 at 11:37 AM John Lilley <john.lil...@redpointglobal.com.invalid> wrote: > Greetings, > > > > Lino already posted this, but I think it got buried in the larger > discussion of HA configuration. > > > > When a failover happens, we occasionally hit errors like > > > > 2024-01-18T22:46:13.436 [http-nio-9910-exec-7] > RpcExceptionMapper.toResponse:79 [] INFO - Error in RPC response > RpcException: httpCode=500, errorMessage=error sending message: AMQ219014: > Timed out after waiting 30000 ms for response when sending packet 71 > > The exception is thrown from > org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.send(). See > Lino’s previous post for the entire stack trace. > > > > I’ve seen this kind of thing before and added logic to retry the send() > call for “AMQ219016 Connection failure detected. Unblocking a blocking > call that will never get a response”. > > > > I don’t have a problem retrying this code as well (in fact I’ve added the > range AMQ219011 - AMQ219016 to the retry logic), but the 30 second delay is > quite long, and it starts to trigger our own RPC timeouts by the time all > of the reconnect is performed. > > > > So my questions are: > > - Is it expected that we’d hit this error during failover and that it > takes this long to manifest? > - Is there a way to reduce the timeout value, and is that recommended? > - Are there yet more codes we should retry? > > Thanks > > john > > > > > > [image: rg] <https://www.redpointglobal.com/> > > John Lilley > > Data Management Chief Architect, Redpoint Global Inc. > > 34 Washington Street, Suite 205 Wellesley Hills, MA 02481 > > *M: *+1 7209385761 <+1%207209385761> | john.lil...@redpointglobal.com > > PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is > confidential and is intended solely for the use of the individual(s) to > whom it is addressed. If you believe you received this e-mail in error, > please notify the sender immediately, delete the e-mail from your computer > and do not copy, print or disclose it to anyone else. If you properly > received this e-mail as a customer, partner or vendor of Redpoint, you > should maintain its contents in confidence subject to the terms and > conditions of your agreement(s) with Redpoint. >