I'm doing performance characterization of ActiveMQ when a network of brokers runs across a high-latency (100ms range) WAN. When my producer on one side of the WAN sends faster than our meager allocation of the WAN's bandwidth, I quickly see all messages fail to be delivered to the end consumer.
These are the three critical elements of the problem, which all have to be present for it to happen: 1. Messages have a TTL set (the same for all messages), so they'll eventually expire. We're using Camel to do this for us, but it would be the same if it were set directly without Camel's help. 2. Producers are sending messages faster (in aggregate) than our bandwidth allocation on the WAN. This means we're guaranteed to not deliver some of the messages to the end consumer, but in practice we're not delivering any of them. 3. There is a non-trivial amount of latency across the WAN. As messages are sent, they begin queuing on the sender-side broker. As time goes on, the messages that are still in the producer-side broker's message store get closer and closer to expiring, until eventually the message at the head of the message store is within the WAN's latency value (e.g. 100ms) of the message's expiration time. The amount of time it takes for this to happen depends on how long it takes messages to time out and on the difference between the producer's send rate and the WAN's bandwidth, but it will eventually happen. This message will be sent by the producer-side broker (because although it's really close to expiring, it hasn't expired yet), but when it's received by the consumer-side broker, an amount of time equal to the WAN latency has passed, so it's expired and gets discarded by the consumer-side broker instead of getting delivered to the consumer. >From this point onwards, no messages will get successfully delivered to the consumer. As the messages in the producer-side broker's message store get closer to and eventually reach their expiration times, each message will either be within the WAN latency of its timeout or after its timeout. If the former, it will get sent across the WAN but discarded by the consumer-side broker; if the latter, it will get discarded by the producer-side broker and that broker will find the next message in the message store that isn't yet expired (but will be by the time it arrives) and send it instead. As a result, all messages from that point onward either expire on the producer-side broker or the consumer-side broker. Even though there are lots of messages in the producer-side broker's message store that could be delivered successfully, ActiveMQ instead sends the first message in the message store even though an outside observer knows it will just get thrown away. Ideally, ActiveMQ should prioritize messages that are expected to reach an end consumer over ones that are expected to time out before they get there, to minimize wasteful use of scarce resources such as network links. Doing that automatically and without any the user having to provide lots of up-front configuration of network topology sounds hard, particularly when considering that network link performance can vary over time and that different consumers may have different network paths from the producer to the consumer. But I think it would be very useful to have a setting that allows a user to specify that messages within X milliseconds of their expiration time will be discarded by the broker rather than forwarded to the next broker. The default should be 0 (so all messages that haven't actually expired would be forwarded), but if I know that my network path has a certain latency, I should be able to configure the broker to not even try delivering messages that I know aren't likely to make it to an end consumer, so that messages that will can be sent instead. Does this seem like a reasonable feature to add? If so, I'll submit a JIRA for it. Tim