The low memory situation in these tests is mainly caused by a high number of
client connections leading to a high number of transport threads (not using
NIO). I noticed that the marshalling cache used with openwire eats up quite
a lot of memory. This cache can be reduced. So, the test is running under
non-optimal conditions. You can call it a "bad day" test. But that's
typically good for finding edge cases. 

But anyhow, a certain overload and resulting memory shortage cannot be fully
prevented. Of course, one tries to size a system accordingly to reduce the
risk of running into resource shortages or low memory. But still, by
Murphy's law, you can end up in a low memory situation where long(er) GCs
may happen. When using a low value for lockAcquireSleepInterval then this
situation may happen even earlier (10s GC runs will cause this with the
default lockAcquireSleepInterval setting).

Coming back to the sequence id. You mentioned the broker reads its sequence
id from the store when it has the lock. That probably means the new master
reads it upon initialization when it takes over. It seems the sequence
generator is initialized with the value from store when the RegionBroker is
instantiated. 

Does the old master re-read its sequence id in the situation when it looses
the lock? Probably not.
Then, both masters could generate duplicate sequence ids. They might face a
duplicate key on insert to MSGS table, but only if the message with this
sequence id was not already consumed in the meantime (in case the old master
is on hold due to GC, it will probably generate outdated sequence ids).
What is the effect of generating duplicate sequence ids? Could this cause
follow up issues with clients?

For the JDBC store this means the effect of a parallel master situation is
that masters may generate duplicate sequence ids. Probably, it would help if
a master detects such a duplicate key situation and does either stop (if not
already triggered) or tries to resync the sequence generator.

I will try to think about other effects of having a parallel master, e.g.:
- "loss" of non-persistent messages (in case producer/consumer are connected
to different masters)
- stale persistent messages that the old master still accepted and did not
deliver before stopping
- not sure what duplicate sequence ids could mean for clients (does this
affect message replays, ack or tx handling in any way)
- duplicate delivery of persistent inflight messages (as you already
mentioned)




--
View this message in context: 
http://activemq.2283324.n4.nabble.com/LeaseDatabaseLocker-and-parallel-masters-tp4680368p4680391.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Reply via email to