I hope you don't mind if I reply to this thread. I'd also like to report messages getting lost.
I've had 2 occurrences of losing messages when using simple replication (1 live and 1 backup server). I was using artemis v2.22.0. I was not able to replicate the issue, and I think it happened when I rebooted the live server. The messages lost were stored persistently, in a durable queue, with no consumers online. Not sure about producers. All I see in the logs are warnings like these two: - 2022-07-01 14:52:16,282 WARN [org.apache.activemq.artemis.core.server] AMQ222092: Connection to the backup node failed, removing replication now: ActiveMQRemoteDisconnectException[errorType=REMOTE_DISCONNECT message=null] - 2022-07-01 14:52:16,295 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to 10.108.28.52/10.108.28.52:9000 has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED] The only thing that comes to mind that could be the problem is changing the port for cluster communication from default 61616 to 9000 (i've experienced some problems unrelated to message loss when changing the port). Any advice on reproducing the issue or where to look for more data appreciated. -----Original Message----- From: Clebert Suconic <clebert.suco...@gmail.com> Sent: Wednesday, October 19, 2022 4:58 PM To: users@activemq.apache.org Subject: Re: Messages getting lost on Artemis 2.25 To sporočilo izvira izven naše organizacije. Bodite pozorni pri vsebini in odpiranju povezav ali prilog. Basically I'm telling you how to investigate it.. and if you find an issue on the broker, we will need a way to reproduce it. I have no other report about a message loss situation... (we do have situations with page-counters going wrong while paging..which I'm working now to fix it... but no message loss). On Wed, Oct 19, 2022 at 10:55 AM Clebert Suconic <clebert.suco...@gmail.com> wrote: > > I am not aware of any issues that would lead to message loss... > > Garbage Collection itself has no effect on anything regarding paging or > journal. > > > Are you able to chase which message is lost on a test? > > > you could use the retention feature, replay the message.. and you > could also look on the ./artemis data print on what happened to the > message. > > > One other suggestion I could make is to use Federation instead of > clustering. Perhaps message are stranded on the Store and forward > queue? > > > also.. you have consumers in all the nodes.. you should use clustering > with OFF-WITH-REDISTRIBUTION, or use Federation. you should always > favor the local consumers. > > On Wed, Oct 19, 2022 at 8:16 AM Walter de Boer <walterdeb...@dbso.nl> wrote: > > > > All, > > > > This week we lost 23.000 messages in a few days time on our > > production Cluster running Artemis 2.26.0, see our settings below. > > We've reverted back to Artemis 2.20.0 just in case > > > > A few observatoins: > > > > * In version 2.24.0, 2.25.0 and 2.26.0 running on ZGC we noticed > > messages being produced to a queue without errors, that we didn't > > find in that queue. At the same time we saw incorrect counters. We > > did restart nodes to resolve, but on one occasion the error > > continued for some time after that, and we never found the messages > > again. Not even when exporting the journal files. The errors showed > > after running a few days > > * In version 2.20.0 running on G1GC and on ZGC we did not lose any > > messages. We did experience memory issues resulting in (to) long > > garbage collection times every other week, maybe due to lack of JVM > > tuning on our side. We were running 2.20 on G1GC for serveral > > months > > > > We're running a symetric Cluser of 3 live/backup pairs in Docker JRE > > (temurin) containers on VMWare CentOS7 hosts. Each live node has > > around > > 1.000 producers & consumers continuously. > > > > I hope the Artemis community can advise us in this? > > > > Best Regards, > > > > Walter > > > > > > Our setup: > > > > * > > **docker-compose.yaml** > > * > > version: "3.8" > > > > services: > > artemis: > > container_name: 'artemis' > > network_mode: "host" > > image: "cdplatform/activemq-artemis:2.26.0" > > restart: 'always' > > hostname: cjiblx8408.ato.cjib.minjus.nl > > volumes: > > - "/data/artemis/data:/var/lib/artemis/data" > > - "/data/artemis/plugins:/var/lib/artemis/lib" > > - "/data/artemis/etc:/var/lib/artemis/etc" > > - "/data/artemis/etc-override:/var/lib/artemis/etc-override" > > - "/logging/artemis:/var/lib/artemis/log" > > environment: > > ARTEMIS_MIN_MEMORY: "14051615047" > > ARTEMIS_MAX_MEMORY: "14051615047" > > JAVA_XTRA_ARGS: "-XX:ActiveProcessorCount=4 -XX:+UseZGC > > -XX:+UseDynamicNumberOfGCThreads -XX:+UseStringDeduplication " > > BROKER_SETTINGS_FILE: "broker-settings.xml" > > ENABLE_JMX: "true" > > JMX_PORT: "3333" > > ENABLE_JMX_EXPORTER: "true" > > JMX_RMI_PORT: "1098" > > mem_swappiness: 0 > > memswap_limit: 20073735782 > > deploy: > > resources: > > limits: > > memory: "20073735782" > > reservations: > > memory: "20073735782" > > > > *Command line options:* > > > > /opt/java/openjdk/bin/java > > > > -javaagent:/opt/jmx-exporter/jmx_prometheus_javaagent.jar=9404:/opt/jmx-exporter/etc/jmx-exporter-config.yaml > > -Xmx17564518809 > > -Xms17564518809 > > -Dcom.sun.management.jmxremote.authenticate=true > > > > -Dcom.sun.management.jmxremote.password.file=/var/lib/artemis/etc/jmxremote.password > > > > -Dcom.sun.management.jmxremote.access.file=/var/lib/artemis/etc/jmxremote.access > > -Dcom.sun.management.jmxremote.port=3333 > > -Dcom.sun.management.jmxremote.rmi.port=1098 > > -Dcom.sun.management.jmxremote.ssl=false > > -Djava.net.preferIPv4Addresses=true > > -Djava.net.preferIPv4Stack=true > > -XX:ActiveProcessorCount=4 > > -XX:+UseZGC > > -XX:+UseDynamicNumberOfGCThreads > > -XX:+UseStringDeduplication > > -Dhawtio.realm=activemq > > -Dhawtio.offline=true > > -Dhawtio.role=gs-auth-Artemis_Admin,gs-auth-Artemis_User > > > > -DPrincipalClasses=org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal > > -Djolokia.policyLocation=file:/var/lib/artemis/etc/jolokia-access.xml > > -Dcom.sun.management.jmxremote.ssl=false > > -Xbootclasspath/a:/var/lib/artemis/lib/javax.json-1.1.4.jar > > -Dhawtio.role=gs-auth-Artemis_Admin,gs-auth-Artemis_User > > > > -Xbootclasspath/a:/opt/apache-artemis/lib/jboss-logmanager-2.1.18.Final.jar:/opt/apache-artemis/lib/wildfly-common-1.5.2.Final.jar:/opt/apache-artemis/lib/javax.json-1.1.4.jar > > -Djava.security.auth.login.config=/var/lib/artemis/etc/login.config > > -classpath /opt/apache-artemis/lib/artemis-boot.jar > > -Dartemis.home=/opt/apache-artemis > > -Dartemis.instance=/var/lib/artemis > > -Djava.library.path=/opt/apache-artemis/bin/lib/linux-x86_64 > > -Djava.io.tmpdir=/var/lib/artemis/tmp > > -Ddata.dir=/var/lib/artemis/data > > -Dartemis.instance.etc=/var/lib/artemis/etc > > -Djava.util.logging.manager=org.jboss.logmanager.LogManager > > -Dlogging.configuration=file:/var/lib/artemis/etc//logging.properties > > -Dartemis.default.sensitive.string.codec.key= > > org.apache.activemq.artemis.boot.Artemis > > run > > > > *broker-settings.xml**:* > > > > <core xmlns="urn:activemq:core"> > > <global-max-size>2810323009</global-max-size> > > <name>xxxxxxx.xxxxxx.xx</name> > > <graceful-shutdown-enabled > > xmlns="urn:activemq:core">true</graceful-shutdown-enabled> > > <graceful-shutdown-timeout > > xmlns="urn:activemq:core">10000</graceful-shutdown-timeout> > > <management-address > > xmlns="urn:activemq:core">activemq.management</management-address> > > <persistence-enabled > > xmlns="urn:activemq:core">true</persistence-enabled> > > <id-cache-size xmlns="urn:activemq:core">20000</id-cache-size> > > <persist-id-cache xmlns="urn:activemq:core">true</persist-id-cache> > > <paging-directory > > xmlns="urn:activemq:core">data/paging</paging-directory> > > <bindings-directory > > xmlns="urn:activemq:core">data/bindings</bindings-directory> > > <large-messages-directory > > xmlns="urn:activemq:core">data/large-messages</large-messages-directory> > > <journal-directory > > xmlns="urn:activemq:core">data/journal</journal-directory> > > <journal-type xmlns="urn:activemq:core">ASYNCIO</journal-type> > > <journal-datasync xmlns="urn:activemq:core">true</journal-datasync> > > <journal-min-files xmlns="urn:activemq:core">2</journal-min-files> > > <journal-pool-files xmlns="urn:activemq:core">10</journal-pool-files> > > <journal-device-block-size > > xmlns="urn:activemq:core">4096</journal-device-block-size> > > <journal-file-size xmlns="urn:activemq:core">10MB</journal-file-size> > > <journal-buffer-size > > xmlns="urn:activemq:core">490KB</journal-buffer-size> > > <journal-compact-min-files > > xmlns="urn:activemq:core">10</journal-compact-min-files> > > <journal-compact-percentage > > xmlns="urn:activemq:core">30</journal-compact-percentage> > > <journal-lock-acquisition-timeout > > xmlns="urn:activemq:core">-1</journal-lock-acquisition-timeout> > > <journal-file-open-timeout > > xmlns="urn:activemq:core">5</journal-file-open-timeout> > > <journal-sync-non-transactional > > xmlns="urn:activemq:core">true</journal-sync-non-transactional> > > <journal-sync-transactional > > xmlns="urn:activemq:core">true</journal-sync-transactional> > > <disk-scan-period xmlns="urn:activemq:core">5000</disk-scan-period> > > <max-disk-usage xmlns="urn:activemq:core">90</max-disk-usage> > > <critical-analyzer xmlns="urn:activemq:core">true</critical-analyzer> > > <critical-analyzer-timeout > > xmlns="urn:activemq:core">120000</critical-analyzer-timeout> > > <critical-analyzer-check-period > > xmlns="urn:activemq:core">60000</critical-analyzer-check-period> > > <critical-analyzer-policy > > xmlns="urn:activemq:core">LOG</critical-analyzer-policy> > > <page-sync-timeout > > xmlns="urn:activemq:core">548000</page-sync-timeout> > > <acceptors xmlns="urn:activemq:core"> > > <acceptor > > > > name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;connectionsAllowed=1536;directDeliver=false;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true;protocols=CORE,AMQP,STOMP,HORNETQ,OPENWIRE;</acceptor> > > </acceptors> > > <connectors xmlns="urn:activemq:core"> > > <connector name="artemis">tcp://xxxxxxx.xxxxxx.xx:61616</connector> > > </connectors> > > <cluster-user xmlns="urn:activemq:core">artemis</cluster-user> > > <cluster-password > > xmlns="urn:activemq:core">xxxxxxxx</cluster-password> > > <broadcast-groups xmlns="urn:activemq:core"> > > <broadcast-group name="bg-group1"> > > <group-address>231.7.7.10</group-address> > > <group-port>9876</group-port> > > <broadcast-period>5000</broadcast-period> > > <connector-ref>artemis</connector-ref> > > </broadcast-group> > > </broadcast-groups> > > <discovery-groups xmlns="urn:activemq:core"> > > <discovery-group name="dg-group1"> > > <group-address>231.7.7.10</group-address> > > <group-port>9876</group-port> > > <refresh-timeout>10000</refresh-timeout> > > </discovery-group> > > </discovery-groups> > > <cluster-connections xmlns="urn:activemq:core"> > > <cluster-connection name="artemis-ato"> > > <connector-ref>artemis</connector-ref> > > <retry-interval>2000</retry-interval> > > <initial-connect-attempts>1000</initial-connect-attempts> > > <reconnect-attempts>1000</reconnect-attempts> > > <message-load-balancing>ON_DEMAND</message-load-balancing> > > <max-hops>1</max-hops> > > <discovery-group-ref discovery-group-name="dg-group1"/> > > </cluster-connection> > > </cluster-connections> > > <ha-policy xmlns="urn:activemq:core"> > > <replication> > > <master> > > <check-for-live-server>true</check-for-live-server> > > <vote-on-replication-failure>true</vote-on-replication-failure> > > <group-name>ato-hapair-1</group-name> > > </master> > > </replication> > > </ha-policy> > > <metrics xmlns="urn:activemq:core"> > > <jvm-memory>true</jvm-memory> > > <jvm-gc>true</jvm-gc> > > <jvm-threads>true</jvm-threads> > > <plugin > > > > class-name="org.apache.activemq.artemis.core.server.metrics.plugins.ArtemisPrometheusMetricsPlugin"/> > > </metrics> > > <security-settings xmlns="urn:activemq:core"> > > <security-setting match="activemq.management"> > > <permission type="manage" roles="amq,service"/> > > </security-setting> > > <security-setting match="#"> > > <permission type="manage" roles="amq,service"/> > > <permission type="send" roles="amq,service,b2bi"/> > > <permission type="consume" roles="amq,service,b2bi"/> > > <permission type="browse" roles="amq,service"/> > > <permission type="createAddress" roles="amq,service"/> > > <permission type="deleteAddress" roles="amq,service"/> > > <permission type="createDurableQueue" roles="amq,service"/> > > <permission type="deleteDurableQueue" roles="amq,service"/> > > <permission type="createNonDurableQueue" roles="amq,service"/> > > <permission type="deleteNonDurableQueue" roles="amq,service"/> > > </security-setting> > > <role-mapping from="gs-auth-Artemis_Admin" to="amq"/> > > <role-mapping from="gs-auth-Artemis_User" to="service"/> > > </security-settings> > > <address-settings xmlns="urn:activemq:core"> > > <address-setting match="activemq.management#"> > > <dead-letter-address>DLQ</dead-letter-address> > > <expiry-address>ExpiryQueue</expiry-address> > > <redelivery-delay>0</redelivery-delay> > > > > <message-counter-history-day-limit>10</message-counter-history-day-limit> > > <max-size-bytes>-1</max-size-bytes> > > <max-size-messages>-1</max-size-messages> > > <address-full-policy>PAGE</address-full-policy> > > <auto-create-queues>true</auto-create-queues> > > <auto-create-addresses>true</auto-create-addresses> > > <auto-create-jms-queues>true</auto-create-jms-queues> > > <auto-create-jms-topics>true</auto-create-jms-topics> > > </address-setting> > > <address-setting match="#"> > > <dead-letter-address>DLQ</dead-letter-address> > > <expiry-address>ExpiryQueue</expiry-address> > > <redelivery-delay>0</redelivery-delay> > > > > <message-counter-history-day-limit>10</message-counter-history-day-limit> > > <max-size-bytes>-1</max-size-bytes> > > <max-size-messages>-1</max-size-messages> > > <address-full-policy>PAGE</address-full-policy> > > <auto-create-queues>true</auto-create-queues> > > <auto-create-addresses>true</auto-create-addresses> > > <auto-create-jms-queues>true</auto-create-jms-queues> > > <auto-create-jms-topics>true</auto-create-jms-topics> > > </address-setting> > > <address-setting match="jms.#"> > > <dead-letter-address>DLQ</dead-letter-address> > > <expiry-address>ExpiryQueue</expiry-address> > > <max-delivery-attempts>5</max-delivery-attempts> > > <redelivery-delay>500</redelivery-delay> > > <redelivery-delay-multiplier>1.5</redelivery-delay-multiplier> > > > > <redelivery-collision-avoidance-factor>0.5</redelivery-collision-avoidance-factor> > > <redistribution-delay>30000</redistribution-delay> > > <send-to-dla-on-no-route>true</send-to-dla-on-no-route> > > <max-size-bytes>-1</max-size-bytes> > > <max-size-messages>-1</max-size-messages> > > <address-full-policy>PAGE</address-full-policy> > > > > <message-counter-history-day-limit>10</message-counter-history-day-limit> > > <auto-create-queues>false</auto-create-queues> > > <auto-delete-queues>false</auto-delete-queues> > > <auto-delete-created-queues>false</auto-delete-created-queues> > > <auto-delete-queues-delay>30000</auto-delete-queues-delay> > > <config-delete-queues>OFF</config-delete-queues> > > <auto-create-addresses>false</auto-create-addresses> > > <auto-delete-addresses>false</auto-delete-addresses> > > <auto-delete-addresses-delay>30000</auto-delete-addresses-delay> > > <config-delete-addresses>OFF</config-delete-addresses> > > </address-setting> > > <address-setting match="activemq.notifications"> > > <max-size-bytes>-1</max-size-bytes> > > <max-size-messages>-1</max-size-messages> > > <address-full-policy>PAGE</address-full-policy> > > </address-setting> > > <address-setting match="jms.queue.#"> > > <default-address-routing-type>ANYCAST</default-address-routing-type> > > <default-queue-routing-type>ANYCAST</default-queue-routing-type> > > </address-setting> > > <address-setting match="jms.topic.#"> > > <default-address-routing-type>MULTICAST</default-address-routing-type> > > <default-queue-routing-type>MULTICAST</default-queue-routing-type> > > </address-setting> > > </address-settings> > > <addresses xmlns="urn:activemq:core"> > > <address name="DLQ"> > > <anycast> > > <queue name="DLQ"/> > > </anycast> > > </address> > > <address name="ExpiryQueue"> > > <anycast> > > <queue name="ExpiryQueue"/> > > </anycast> > > </address> > > </addresses> > > <broker-plugins xmlns="urn:activemq:core"> > > <broker-plugin > > > > class-name="org.apache.activemq.artemis.core.server.plugin.impl.LoggingActiveMQServerPlugin"> > > <property key="LOG_ALL_EVENTS" value="false"/> > > <property key="LOG_CONNECTION_EVENTS" value="false"/> > > <property key="LOG_SESSION_EVENTS" value="false"/> > > <property key="LOG_CONSUMER_EVENTS" value="false"/> > > <property key="LOG_DELIVERING_EVENTS" value="false"/> > > <property key="LOG_SENDING_EVENTS" value="false"/> > > <property key="LOG_INTERNAL_EVENTS" value="false"/> > > </broker-plugin> > > </broker-plugins> > > </core> > > > > > > > > > -- > Clebert Suconic -- Clebert Suconic NOTICE - NOT TO BE REMOVED. This e-mail and any attachments are confidential and may contain legally privileged information and/or copyright material of Actual I.T. or third parties. If you are not an authorised recipient of this e-mail, please contact Actual I.T. immediately by return email or by telephone or facsimile on the above numbers. You should not read, print, re-transmit, store or act in reliance on this email or any attachments and you should destroy all copies of them.