ActiveMQ stops delivering messages to consumers when saturating a high-latency network link

Tim Bain Mon, 15 Sep 2014 10:17:16 -0700

I'm doing performance characterization of ActiveMQ when a network of
brokers runs across a high-latency (100ms range) WAN.  When my producer on
one side of the WAN sends faster than our meager allocation of the WAN's
bandwidth, I quickly see all messages fail to be delivered to the end
consumer.


These are the three critical elements of the problem, which all have to be
present for it to happen:
1.  Messages have a TTL set (the same for all messages), so they'll
eventually expire.  We're using Camel to do this for us, but it would be
the same if it were set directly without Camel's help.
2.  Producers are sending messages faster (in aggregate) than our bandwidth
allocation on the WAN.  This means we're guaranteed to not deliver some of
the messages to the end consumer, but in practice we're not delivering any
of them.
3.  There is a non-trivial amount of latency across the WAN.

As messages are sent, they begin queuing on the sender-side broker.  As
time goes on, the messages that are still in the producer-side broker's
message store get closer and closer to expiring, until eventually the
message at the head of the message store is within the WAN's latency value
(e.g. 100ms) of the message's expiration time.  The amount of time it takes
for this to happen depends on how long it takes messages to time out and on
the difference between the producer's send rate and the WAN's bandwidth,
but it will eventually happen.  This message will be sent by the
producer-side broker (because although it's really close to expiring, it
hasn't expired yet), but when it's received by the consumer-side broker, an
amount of time equal to the WAN latency has passed, so it's expired and
gets discarded by the consumer-side broker instead of getting delivered to
the consumer.

>From this point onwards, no messages will get successfully delivered to the
consumer.  As the messages in the producer-side broker's message store get
closer to and eventually reach their expiration times, each message will
either be within the WAN latency of its timeout or after its timeout.  If
the former, it will get sent across the WAN but discarded by the
consumer-side broker; if the latter, it will get discarded by the
producer-side broker and that broker will find the next message in the
message store that isn't yet expired (but will be by the time it arrives)
and send it instead.  As a result, all messages from that point onward
either expire on the producer-side broker or the consumer-side broker.
Even though there are lots of messages in the producer-side broker's
message store that could be delivered successfully, ActiveMQ instead sends
the first message in the message store even though an outside observer
knows it will just get thrown away.

Ideally, ActiveMQ should prioritize messages that are expected to reach an
end consumer over ones that are expected to time out before they get there,
to minimize wasteful use of scarce resources such as network links.  Doing
that automatically and without any the user having to provide lots of
up-front configuration of network topology sounds hard, particularly when
considering that network link performance can vary over time and that
different consumers may have different network paths from the producer to
the consumer.  But I think it would be very useful to have a setting that
allows a user to specify that messages within X milliseconds of their
expiration time will be discarded by the broker rather than forwarded to
the next broker.  The default should be 0 (so all messages that haven't
actually expired would be forwarded), but if I know that my network path
has a certain latency, I should be able to configure the broker to not even
try delivering messages that I know aren't likely to make it to an end
consumer, so that messages that will can be sent instead.

Does this seem like a reasonable feature to add?  If so, I'll submit a JIRA
for it.

Tim

ActiveMQ stops delivering messages to consumers when saturating a high-latency network link

Reply via email to