A few questions: - what prefetch size are your consumers using? Ideally it should be set to 1 to prevent out of order messages. - what acknowledgement mode are you using? - as each queue supports a max of 1024 groups, by default, are you sure you're not blowing past this number? That might cause the broker to spend most its time repeatedly re-assigning groups to consumers.
You did say something about millions of UUID value being seen per hour. - are you hashing those UUID values down so that multiple UUIDs map to the same group ID? - what happens when you force the broker to close all message groups so that it performs a re-assignment of all known groups to consumers? This operation is available through JMX. Thanks, Paul On Thu, Nov 9, 2017 at 5:28 AM, n.yovchev <n.yovc...@relay42.com> wrote: > Hello, > > We have been using AMQ in production for quite a while some time already, > and we are noticing a strange behavior on one of our queues. > > The situation is as follows: > > - we do clickstream traffic so when we have identified a user, all his > events are "grouped" by JMSXGroupID property (which is an UUID, in our > case, > we can have millions of these per hour) so we have some order in consuming > the events for the same user in case they do burst > - we use KahaDB with kinda the following config: > > <mKahaDB directory="${activemq.data}/mkahadb"> > <filteredPersistenceAdapters> > <filteredKahaDB perDestination="true"> > <persistenceAdapter> > <kahaDB checkForCorruptJournalFiles="true" > journalDiskSyncStrategy="PERIODIC" journalDiskSyncInterval="5000" > preallocationStrategy="zeros" concurrentStoreAndDispatchQueues="false" /> > </persistenceAdapter> > </filteredKahaDB> > </filteredPersistenceAdapters> > </mKahaDB> > > - the broker is in a rather beefy EC2 instance, but it doesn't seem to hit > any limits, neither file limits, nor IOPS, nor CPU limits > - destination policy for this destination uses, very similar to a lot other > destinations that use the same grouping for JMSXGroupID: > > <policyEntry queue="suchDestination>" producerFlowControl="false" > memoryLimit="256mb" maxPageSize="5000" maxBrowsePageSize="2000"> > <messageGroupMapFactory> > <simpleMessageGroupMapFactory/> > </messageGroupMapFactory> > <deadLetterStrategy> > <individualDeadLetterStrategy queuePrefix="DLQ." > useQueueForQueueMessages="true" /> > </deadLetterStrategy> > </policyEntry> > > - consumers consume messages fairly slowly compared to other destinations > (about 50-100ms per message compared to > other consumers for other destinations- about 10-30ms per message) > > - however, it seems we end up in a situation, where the consumers are not > consuming with the speed we expect them to be doing, and seem to wait for > something, while there is a huge load of messages on the remote broker for > that destination. The consumers seem to also not be neither CPU, nor IO > bound, nor network traffic bound. > > - a symptom is that if we split that queue to two queues and we attach the > same number of consumers in the same number of nodes to consume it, things > are somehow becoming better. Also, if there is a huge workload for that > queue, if we just rename it to suchQueue2 on producers, and assign some > consumers on it, these consumers are much faster (for a while) than the > consumers on the "old" suchQueue. > > - the queue doesn't have "non-grouped messages", all messages on it have > the > JMSXGroupID property and are of the same type. > > - increasing the number of consumers or lowering it for that queue seems to > have little effect > > - rebooting the consumer apps seems to have little effect once the queue > becomes "slow to consume" > > > Has anybody experienced this: > > in short: > > Broker is waiting a considerable time for the consumers who seem to be free > and not busy all the time. > > > > -- > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User- > f2341805.html >