Re: Question about HA configuration failover problem

Justin Bertram Fri, 02 Feb 2024 13:27:35 -0800

> Asking for clarification on what could be done to “change the application
itself” as you mentioned.  I’m kind of assuming that getting a whole new
Connection/Session/Producer set would be the way to go, but perhaps you
have other suggestions.


Your assumption is correct. I don't have any other suggestions.


Justin

On Tue, Jan 30, 2024 at 11:42 AM John Lilley
<john.lil...@redpointglobal.com.invalid> wrote:

> Hi Justin,
>
>
>
> Thanks for the advice.  We’ll definitely look into reducing those other
> parameters, because we’d like to get the overall reconnect time under our
> default RPC timeout.
>
>
>
> However, when we see AMQ219014, I would like to “fail different” and avoid
> the timeout on the 2nd retry.  Asking for clarification on what could be
> done to “change the application itself” as you mentioned.  I’m kind of
> assuming that getting a whole new Connection/Session/Producer set would be
> the way to go, but perhaps you have other suggestions.
>
>
>
> Thanks
>
> John
>
>
>
>
> [image: rg] <https://www.redpointglobal.com/>
>
> John Lilley
>
> Data Management Chief Architect, Redpoint Global Inc.
>
> 34 Washington Street, Suite 205 Wellesley Hills, MA 02481
>
> *M: *+1 7209385761 <+1%207209385761> | john.lil...@redpointglobal.com
>
> *From:* Justin Bertram <jbert...@apache.org>
> *Sent:* Monday, January 29, 2024 1:55 PM
> *To:* users@activemq.apache.org
> *Subject:* Re: Question about HA configuration failover problem
>
>
>
> **** [Caution] This email is from an external source. Please use caution
> responding, opening attachments or clicking embedded links. ****
>
>
>
> > Is this a bug in the AMQ JMS client?
>
>
>
> At this point I don't believe it is a bug in the ActiveMQ Artemis core JMS
> client.
>
>
>
> I talked about AMQ219014 previously, but I suppose it bears repeating
> here. The timeout is ultimately ambiguous. The client can't reliably
> conclude that the broker has failed due to a timeout like this. It could be
> the result of a network issue or a broker slow-down for some reason (e.g.
> long GC pause). The broker may have received the data sent but simply
> failed to send a response back within the timeout or it may not have
> received anything. How to respond to the timeout is ultimately up to the
> application.
>
>
>
> In this case the application retries the operation which itself fails
> after 44 seconds due to a connection loss. I believe the connection loss is
> based on the default connection TTL of 60 seconds (i.e. 15 + 44 = 59 which
> is close enough) during which time the client never receives any data from
> the broker (i.e. no pings, etc.).
>
>
>
> > When we encounter this error, should we attempt to
> close/destroy/recreate the Producer, Session, or Connection?
>
>
>
> It's hard to say what you "should" do in this circumstance, but if this
> delay is too long then you should probably either change your connection
> URL (e.g. lower your clientFailureCheckPeriod & connectionTTL [1]) or
> change the application itself (as you mentioned) to deal with it so that it
> functions appropriately for your use-case.
>
>
>
>
>
> Justin
>
>
>
> [1]
> https://activemq.apache.org/components/artemis/documentation/latest/connection-ttl.html#detecting-failure-from-the-client
>
>
>
> On Fri, Jan 26, 2024 at 12:51 PM John Lilley <
> john.lil...@redpointglobal.com.invalid> wrote:
>
> Greetings,
>
>
>
> This is something of a follow-up on previous failover issue reports, but
> we’ve taken careful notes and logs and hopefully we have enough information
> to diagnose what is happening.
>
>
>
> We are experiencing an error during AMQ broker failover from live to
> backup.  We are testing this using a load-generator of our own devising, in
> which multiple threads are performing RPC calls, in which requests are
> posted to a named queue and replies are returned on the reply-to temporary
> queue.
>
>
>
> Both JMS client and the broker are version 2.31.2
>
>
>
> Our live (master) broker.xml:
> https://drive.google.com/file/d/10lDHv13AJXKOHZOLIdT7Cph8pt7VghEl/view?usp=sharing
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f10lDHv13AJXKOHZOLIdT7Cph8pt7VghEl%2fview%3fusp%3dsharing&c=E,1,jmtY0WvX5Hh4LTTyzdE_nBVD8UV70vO0oF-fJUL00KWnGL3GfuWTGBKejOAvuGLNlQJbZp1BEtTcvXq5cwzCFDeowZV9bM689Y5atjeX&typo=1>
>
> Our backup (slave) broker.xml:
> https://drive.google.com/file/d/10gNkpFSABskxaPODFI_1GV-DzMIj4tem/view?usp=sharing
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f10gNkpFSABskxaPODFI_1GV-DzMIj4tem%2fview%3fusp%3dsharing&c=E,1,ocsncDoSiI3XMXR8zPK5gINtURhD71mdkFDGEFtXDFTdWMNbjwzq1z8ygPqZQpBKEjBwTIBoKIqE8mqoNx0ipv7XcXb-yAprtPHHYg0zWISigQ,,&typo=1>
>
>
>
> Oddly, the failover from backup to live never has an issue.
>
>
>
> Synopsis of the timeline is:
>
>
>
> 2024-01-23T22:31:20.719 our RPC service attempts to send reply message to
> reply-to-queue
>
>
>
> 2024-01-23T22:31:35.721 the call to Producer.send() fails after 15
> seconds: AMQ219014: Timed out after waiting 15000 ms for response when
> sending packet 45
>
>
>
> Our code delays for two seconds and attempts to call Producer.send() again
>
>
>
> Meanwhile, the backup AMQ broker has sensed failure and taken over, and is
> processing messages from **other** clients:
>
> 2024-01-23 22:29:58,245 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221024: Backup server ActiveMQServerImpl::name=backup is synchronized
> with live server, nodeID=10952195-b6ec-11ee-9c87-aa03cb64206a.
>
> 2024-01-23 22:29:58,252 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221031: backup announced
>
> 2024-01-23 22:31:20,720 WARN  [org.apache.activemq.artemis.core.server]
> AMQ222295: There is a possible split brain on nodeID
> 10952195-b6ec-11ee-9c87-aa03cb64206a. Topology update ignored
>
> 2024-01-23 22:31:20,721 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221066: Initiating quorum vote: LiveFailoverQuorumVote
>
> 2024-01-23 22:31:20,723 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221084: Requested 0 quorum votes
>
> 2024-01-23 22:31:20,723 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221083: ignoring quorum vote as max cluster size is 1.
>
> 2024-01-23 22:31:20,723 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221071: Failing over based on quorum vote results.
>
> 2024-01-23 22:31:20,732 WARN  [org.apache.activemq.artemis.core.client]
> AMQ212037: Connection failure to dm-activemq-live-svc/10.0.52.174:61616
> has been detected: AMQ219015: The connection was disconnected because of
> server shutdown [code=DISCONNECTED]
>
> 2024-01-23 22:31:20,733 WARN  [org.apache.activemq.artemis.core.client]
> AMQ212037: Connection failure to dm-activemq-live-svc/10.0.52.174:61616
> has been detected: AMQ219015: The connection was disconnected because of
> server shutdown [code=DISCONNECTED]
>
> …
>
> 2024-01-23 22:31:21,450 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221007: Server is now live
>
> 2024-01-23 22:31:21,459 INFO  [org.apache.activemq.artemis.core.server]
> AMQ221020: Started EPOLL Acceptor at 0.0.0.0:61617 for protocols
> [CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE]
>
> 2024-01-23 22:31:21,816 INFO
> [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: HEADER=
> {"version":1,"type":"get_task_status_request","id":"d70xpfwljofo","api":"test_harness","method":"get_task_status","instance":"combined","authorization":"73DZU/fb1A2fFnKdzABPbLzAHVw7Z7VsfSLcQ7VqSBQ="},
> BODY={"id":"909b23ae-578f-412d-9706-9f300adb9119","progress_start_index":0,"message_...
>
>
>
> But… our second Producer.send() attempt fails again after about 44 seconds:
>
> 2024-01-23T22:32:21.936 [Thread-2 (ActiveMQ-client-global-threads)]
> JmsProducerPool.send_:376 [j6ugszoiu1gl] WARN - Error sending message, will
> retry javax.jms.JMSException: AMQ219016: Connection failure detected.
> Unblocking a blocking call that will never get a response
>
>                 at
> org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(
> ChannelImpl.java:560
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fChannelImpl.java%3a560&c=E,1,hJP-oLuiosQYaEm2_JIgfYibgfziJ80h9nAD2hhzaU9jP8UI8hgmo4V534e1qd2VNBecsVoH1vnLlTioacw8A9KjC8FSCsO9pyeO6wuVw00p4kOGbg,,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(
> ChannelImpl.java:452
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fChannelImpl.java%3a452&c=E,1,bZYbkoqh308OLJAX07WMy0Vl3b46bqOnqgoyBAdAaxYLlfKMLxMJi_LJlTidINwjMqbvICr8WSQbyPkmpCJ2hdNCnA1jnYjxD2LeWpsmPpp_N4E,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.addressQuery(
> ActiveMQSessionContext.java:434
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQSessionContext.java%3a434&c=E,1,AnTeTm1mDGLBAioRldPylrVu9CoAzJgL8lRlliCwiFA0epHQKXbBvKNJiS5WeqfoV7OZm9DeA3_5vYTY8l4EJ-vtTm92ekX0JvGwCclbMXg5fp4,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.addressQuery(
> ClientSessionImpl.java:808
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fClientSessionImpl.java%3a808&c=E,1,E_FuT-UKlNzzbUgnUXLxP-Wr-gaDvUe7el6V88IhKksPFjrG75Ze9IAn4XXgDn4-TImvLKEYUUyUZ-u9qK6jYYAU8qS__KIQqV9U2TNwwioI&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.jms.client.ActiveMQSession.checkDestination(
> ActiveMQSession.java:390
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQSession.java%3a390&c=E,1,lnKU2NJAcT23jm0JHWdkGzQli2s3dkT2cyrSwRyyqnYzODwV2dNr6nmhfoRIN2Nj6B9Lo41nC8N-_Tmq76SV96_s2shJPRDr3CHhyvFTSARk-Jw4WkTRMQk,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.doSendx(
> ActiveMQMessageProducer.java:406
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQMessageProducer.java%3a406&c=E,1,IIEmrC_0qXXMdbyvWIe5u7krlX5RXe_ivg0BC7I8L1v5HcPrRvRqmOIg4-_Rm07MY5ey-sw5tGDlm4tpad2MOkc7MxxHvEWCMBUTz3cvN_B_q9PitXJxRLw,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.jms.client.ActiveMQMessageProducer.send(
> ActiveMQMessageProducer.java:221
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQMessageProducer.java%3a221&c=E,1,uVg96MC8n8jpK0A0A5EQEzrUr3de_RE_MKjddGCwSX7HomRboWDMcTknoS-6yIGyNA5HB8akKiGSUaVPRRmi4MvfBtBRjwe8cmpjGO1-Rw,,&typo=1&ancr_add=1>
> )
>
>                 at net.redpoint.ipc.jms.JmsProducerPool.send_(
> JmsProducerPool.java:372
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJmsProducerPool.java%3a372&c=E,1,fs54QRXtDQ9u-z0lY17ayNv7RRQu088D5Mb4T6Rb9st7JMt6ceUGY51R_mSe7sWNIhApIbD3pkxeoy0tqyWVJ2KRMW1RNMQkLJW8zg6IClLB1C2stHtY3fvkDrew&typo=1&ancr_add=1>
> )
>
>                 at net.redpoint.ipc.jms.JmsProducerPool.sendResponse(
> JmsProducerPool.java:319
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJmsProducerPool.java%3a319&c=E,1,kp3fPhMOjHutXSX8Deq3KLQz6o5KLht80g-bH4rU8-mLIDSeNFmTZJdu9ZlSCoE1w9Q8Hf1ETs_OesfzU_39pHrvKy6ARcM3FTtMHjggdkECauoovRrB6DU,&typo=1&ancr_add=1>
> )
>
>                 at
> net.redpoint.ipc.jms.JmsRpcServer$RpcReceiver.handleMessage(
> JmsRpcServer.java:225
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJmsRpcServer.java%3a225&c=E,1,atzVauAZcn76EYXbRwXl-yZg4NSQTkBoOh6Zl8lN71qSVxOjh-FEIoC0TF4Wy4TNodoZKAh5DacsW-Zpxcp_a3AU0YbHT6PzPvsVCViudckavehGMoHnX30E&typo=1&ancr_add=1>
> )
>
>                 at net.redpoint.ipc.jms.JmsRpcServer$RpcReceiver.onMessage(
> JmsRpcServer.java:158
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJmsRpcServer.java%3a158&c=E,1,Jm4ywGLfNQey4PX8CyHU5wi7B0PoMVPo79LsaEdSjFtYfTLo7t0MI1zOL1-1PRwV3YlkWPDXq5rUjKtOLW3edOvNiOHAejaPb9X0JpMNNGYLwj1X2CWKgvRW&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.jms.client.JMSMessageListenerWrapper.onMessage(
> JMSMessageListenerWrapper.java:110
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fJMSMessageListenerWrapper.java%3a110&c=E,1,8EHxNHEJye1w-zdSUuBfquALqXjzezwNFaK9FKfByH1rZsBKf9HmKQvnCiG7w8e7bLlGX_1hjv3JzYD63H_nShx6e03HNym8BHEB6AsGYmLTZX4DeWxPFATH&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.callOnMessage(
> ClientConsumerImpl.java:982
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fClientConsumerImpl.java%3a982&c=E,1,u7UapKp5BMF8FGcc5a1o1mqSo9VYLaQB9g2QTFpAq6JJ7QM9KcnUSVGmRU6gmD1Nhsj3MvL9JRLtCVYIZF0WvI4vcyCEUmNNqgssxn13Ps4xX5YCGOBtHg,,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl$Runner.run
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fRunner.run&c=E,1,Fx9Ld2ucdlt3_6p-ZbXUjI_SlgsOyVwnvmcl26sXvFAgTasD82Md2N_EjrRcKbLKHue3crRJHPHyrS9ejGlfrFO8wOniNLS-IvPNqLJKNpgLX4QsXXBWD9A,&typo=1&ancr_add=1>
> (ClientConsumerImpl.java:1139
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fClientConsumerImpl.java%3a1139&c=E,1,rc9ciF96U-0OoMBNQHBoSVfqaE4jGKbWLg6dhaK7WCYccJMW3amIzQA1BbeqmFRAMPRJ0njluDkORyp2JTQnknVqWVJPaBOM-SkeJ10nmXQgAXi5bribWgIscg,,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(
> OrderedExecutor.java:57
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fOrderedExecutor.java%3a57&c=E,1,mjc9IcM1sJ7-uOg5Q8cQKmDZWoSLVx3L90bNL2gHGjbTzHFTWR9Xmct_l1fZGUi5wAFw9Mwwq0BKBqWVrd-FW4Un4tsclZjRGhsTiUhwKui3&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(
> OrderedExecutor.java:32
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fOrderedExecutor.java%3a32&c=E,1,cHlqlCUCUFKGLjJJXqVccNSrb6l0TU6LwNCMoGa_1JS3W_Tlr40V-eoNySZtvuVgWsTLXj9t1wbWD6Q1QyvSb3J_RD-tDpgwhR2HvVcCLsQWP9n46sn0AQ,,&typo=1&ancr_add=1>
> )
>
>                 at
> org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(
> ProcessorBase.java:68
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fProcessorBase.java%3a68&c=E,1,aPd-9lYQMk9t6gU4ORJVd8KN5pDL6BL1FF3bEOJfXAVGd3R8xVHweKadojHKXFNIwP3mb5mxZqvLWKzUwc5UpkzSBbudUI0TIUaZn1NihrmKxHmhaYxG4JRzs2U,&typo=1&ancr_add=1>
> )
>
>                 at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1136
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fThreadPoolExecutor.java%3a1136&c=E,1,oagND5zTtV9BSx1sgOncvLVjcpkKY-EKyoMcGYwSN9Ot8Oi5rKTXzz1eSmqKOu-O3ZmIHGzOzWqOBfNJgaKAO7ppz38EdKXRuIexrEonLiQ14T0tVC5gwpU,&typo=1&ancr_add=1>
> )
>
>                 at java.base/java.util.concurrent.ThreadPoolExecutor$
> Worker.run
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fWorker.run&c=E,1,F2i39NYR5NdRjP-gobPho-u7VXh9GBPbIZCM3IadW-1x7RwzFF-Gopm5c7J-Qq23e5-kUzAtr063W2bI3X8IUk8qqAzKB3p5-sLoo8PKGJOWo1w,&typo=1&ancr_add=1>
> (ThreadPoolExecutor.java:635
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fThreadPoolExecutor.java%3a635&c=E,1,w1js-B5KJ0ipSc6iwmwlRNqjX4NTe6OPejHfmSCpfoVpoqrZvbyC8hS-BnVJt_FRfz_W69Hdw2KRaxS35X12VpuUX4vGZsbaT_ESJAYN&typo=1&ancr_add=1>
> )
>
>                 at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$
> 1.run
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2f1.run&c=E,1,8B8Nm6i_36dhHLklnYJiMhiHKPZ9-7v62nlKBipCORN3jPl48UpQ-O33Z5gKw2Teya4WG4tD58ANdXy8z8B_Yap08Y0xT5-q3VVV7o8NI-s7lfJDOb2INFI,&typo=1&ancr_add=1>
> (ActiveMQThreadFactory.java:118
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fActiveMQThreadFactory.java%3a118&c=E,1,vhaxwhXsE4pqQ7Pen3JsMgiDE7-KBJwOeCEPNvwT7Te8NQprIRmm6R8WWN5q1BoIY6-WPOnhH0CqGgCfn8BPuKfxUV1Ru6_BVAp3DQoe7fGO&typo=1&ancr_add=1>
> )
>
> Caused by: ActiveMQUnBlockedException[errorType=UNBLOCKED
> message=AMQ219016: Connection failure detected. Unblocking a blocking call
> that will never get a response]
>
>                 ... 20 more
>
> Caused by: ActiveMQDisconnectedException[errorType=DISCONNECTED
> message=AMQ219015: The connection was disconnected because of server
> shutdown]
>
>                 at
> org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$
> CloseRunnable.run
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fCloseRunnable.run&c=E,1,2ZmSM23T2gvUV2hroRdgg3OwuDZnBIHtk7Gj4eKcrdwreoRXOxGLAaVqJ9fUjPFH7SpJd-nx9EJStjl1Yi82QTrDTh0h1YY7POcwXzuug32Yfaw,&typo=1&ancr_add=1>
> (ClientSessionFactoryImpl.java:1172
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fClientSessionFactoryImpl.java%3a1172&c=E,1,IHroZJaGwnx3TWcbWFH18rbHo19rhJimanChxpwpg_hwV0dN0nSO0LkZf061Q4xY2XSA_kkMQq6PtraZkUF9VQipkUtLU9uZ9jUUYFeD&typo=1&ancr_add=1>
> )
>
>                 ... 6 more
>
>
>
> We retry again, and this third attempt at Producer.send() does succeed, as
> seen in the backup broker’s log:
>
> 2024-01-23 22:32:23,937 INFO
> [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] SEND: HEADER=
> {"version":1,"type":"testing_echo_response","id":"j6ugszoiu1gl","http_code":200}…
>
>
> 2024-01-23 22:32:23,937 INFO
> [net.redpoint.rpdm.artemis_logger.RpdmArtemisLogger] DELIVER: HEADER=
> {"version":1,"type":"testing_echo_response","id":"j6ugszoiu1gl","http_code":200}…
>
>
>
> Unfortunately, by this time a whole 63 seconds has gone by form the RPC
> caller’s point of view, and our RPC client timed out and gave up.
>
>
>
> It seems to us that the problem can be summarized as "Once the client gets
> the 'AMQ219014: Timed out after waiting 15000 ms' error, an attempt at
> retry will fail again after 44 seconds".
>
>
>
> It is worth noting that, in our send-retry code, we do not attempt to
> destroy/recreate the Connection, Session, or Producer; we believe that the
> client should take care of that for us.  Which it mostly does, except for
> this one case.  And even in this case it does eventually, but the 44-second
> delay is too long for us.  And it is unclear where that 44-second delay
> even comes from.
>
>
>
> FYI our retry loop looks like:
>
> private static final int SEND_RETRIES = 3;
>
> private static final long SEND_RETRY_DELAY_MS = 2000;
>
> ...
>
> var producer = ticket.pi.getProducer();
>
> for (int retry = 0; retry < SEND_RETRIES; retry++) {
>
>   try {
>
>     producer.send(ticket.destination, jmsRequest,
> producer.getDeliveryMode(), producer.getPriority(), ttlMs);
>
>     break;
>
>   } catch (javax.jms.JMSException ex) {
>
>     if (Arrays.stream
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fArrays.stream&c=E,1,q8qmG4USRQcV-y4DdZPj8L9ra-iq79AWF2VL7Jz-d8geggbDGiaFMkKwSEkV2Q-3OF1BWkM7ESmd3Vl0qfPFM0bgnFB71LbRXrMFUe02Y2MuT8VrqwrGXa-IYq0,&typo=1&ancr_add=1>(retryableCodes).anyMatch(code
> -> ex.getMessage().contains(code)) && retry + 1 < SEND_RETRIES) {
>
>       LOG.warn("Error sending message, will retry", ex);
>
>       Thread.sleep(SEND_RETRY_DELAY_MS);
>
>       continue;
>
>     } else {
>
>       throw ex;
>
>     }
>
>   }
>
> }
>
>
>
> Also see the thread dump generated at the 60-second mark, which is *
> *after** the first retry fails but **before** the second retry fails (in
> other words, this is the thread dump of the JVM state when our second
> attempt at Producer.send() is pending):
>
>
> https://drive.google.com/file/d/10dIWqAL65zwWMEfN03WGzC_Ya1QayPGB/view?usp=sharing
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f10dIWqAL65zwWMEfN03WGzC_Ya1QayPGB%2fview%3fusp%3dsharing&c=E,1,lSiz9_2UYM2vFybSlyoASL1tvg_Z30xWs2ZGpgsFFuvAuOG1rMBhhQIXnq5lFXb-XOIOC1do4p1t2QiWv-4aaTtk-AlfzJATe1lqQGd9Vk5IHc2YPvnauKetHYg,&typo=1>
>
>
>
> Our questions come down to two things:
>
>    - Is this a bug in the AMQ JMS client?
>    - When we encounter this error, should we attempt to
>    close/destroy/recreate the Producer, Session, or Connection?
>
>
>
> Please let me know if you can think of a workaround, or if there is more
> information we should capture.  This problem is readily reproducible.
>
>
>
> Thanks
>
> John
>
>
>
>
>
>
>
> [image: rg]
> <https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.redpointglobal.com%2f&c=E,1,FNxLyCg6EZpqCXU3KJhfSUofx4lGIaLC8rwl24yjbZJL9LYFgjo0C7OHTdxUO7LIHhzRqXtOxO_1jOzh-PMrEMYcTItHHGQH4zEmg4_OKg,,&typo=1>
>
> *John Lilley *
>
> *Data Management Chief Architect, Redpoint Global Inc. *
>
> 34 Washington Street, Suite 205 Wellesley Hills, MA 02481
>
> *M: *+1 7209385761 <+1%207209385761> | john.lil...@redpointglobal.com
>
>
> PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is
> confidential and is intended solely for the use of the individual(s) to
> whom it is addressed. If you believe you received this e-mail in error,
> please notify the sender immediately, delete the e-mail from your computer
> and do not copy, print or disclose it to anyone else. If you properly
> received this e-mail as a customer, partner or vendor of Redpoint, you
> should maintain its contents in confidence subject to the terms and
> conditions of your agreement(s) with Redpoint.
>
>
> PLEASE NOTE: This e-mail from Redpoint Global Inc. (“Redpoint”) is
> confidential and is intended solely for the use of the individual(s) to
> whom it is addressed. If you believe you received this e-mail in error,
> please notify the sender immediately, delete the e-mail from your computer
> and do not copy, print or disclose it to anyone else. If you properly
> received this e-mail as a customer, partner or vendor of Redpoint, you
> should maintain its contents in confidence subject to the terms and
> conditions of your agreement(s) with Redpoint.
>

Re: Question about HA configuration failover problem

Reply via email to