I think you need to figure out whether you're trying to build a distributed cache or a distributed database, because right now you seem to be trying to build something halfway between the two and it's leading to an inconsistent set of requirements.
If you're trying to build a distributed cache, then you can populate the cache from a truth store (e.g. a RDBMS) whenever there is a cache miss. This implies that it's perfectly safe to clear the cache when you can't talk to the ActiveMQ broker, and that it's fine to bring up a new node (either to scale up or to replace one that fails) because it has a way to restore its cache and get in sync with the rest of the nodes. In this case, you'd not want to rollback the transaction in the case of failure to publish to the ActiveMQ broker, because the cache in each node would simply be filled by a query on a cache miss. And you would only need to clear the cache when you encounter broker connection problems if your data contains updates or deletions to existing records; it it's a stream of new records, then what's in the cache is fine to keep, and you can fill it up with the records you don't get via ActiveMQ by querying the truth store when you encounter cache misses. If on the other hand your trying to build a distributed database, then you do need to perform a rollback when you fail to publish, but you also need a way to populate a node when it starts (empty). And if you're going to clear the local copy of the database, then you're going to lose all your data whenever all nodes simultaneously lose their connection to the broker, so you probably don't want to do that. Instead you'd want to keep all records, and then have a mechanism for nodes synchronizing between themselves when connectivity is restored; this would also handle the new-node problem. And if this is what you're looking for, please stop calling it a cache. There's a lot more complexity in the second case than in the first, and my strong recommendation is to do the first (only a cache, with a backing database). I know that you think that fewer deployed components makes a simpler solution, but I think that simpler/less code, with less development effort and less risk of bugs, is actually the simpler solution even if it requires an additional component (an RDBMS). And if you already have an RDBMS in your architecture (otherwise, how are you rolling back your transaction when you fail to publish to ActiveMQ?), then the distributed cache really is simpler than the distributed database. Additional responses are inline. Tim On Apr 19, 2018 8:00 AM, "pragmaticjdev" <amits...@gmail.com> wrote: > If you use the TCP transport, it will stay connected continuously, and > then > disconnect if the broker goes down. If that happens, I believe an > exception > will be thrown the next time your code polls for new messages. If the exception gets thrown only when the subscriber polls then the cache could be stale. Is there something that notifies (through an exception?) when the broker goes down? Typically you would call it in a tight loop, so you're only as stale as the amount of time it takes you to publish the messages received the last time. > In general, the terms "atomic" and "distributed" are contradictory in most > situations. If the OP is using this as a cache (i.e. a place to hold the > results of operations that you would otherwise be able to compute on > demand, as a performance optimization), he will be fine, and atomicity > doesn't matter. If on the other hand the OP requires that the "cache" stay > fully in sync and it's not possible to compute the value in the case of a > cache miss, then that's a database not a cache. Our application has a OLTP use case. The objects that we want to cache are metadata objects of the application. They don't change frequently after the initial configuration. Hence the thought of having a replicated cache. I came across this discussion < http://activemq.2283324.n4.nabble.com/How-to-ensure-reliability-of-publish-subscribe-against-occassional-network-hiccup-td2354084.html> from the old threads which looked exactly similar to what I am trying to do. I can sequence the writes such that we first write to the database (RDBMS) and then publish it on JMS. If that fails we can rollback the database transaction. We look good on the publisher side. We can even guarantee that the subscribers receive the copy with CLIENT_ACKNOWLEDGE (activemq would retry till everyone reads the copy) but the two situations are something that I cannot find an answer to 1. How should our application handle a failure flow in which the publisher pushes a message on the topic successfully but activemq goes down before sending that message to any of the subscribers (assuming non-durable subscribers)? Is there a way the subscribers could know that activemq is down? If so, we could invalidate the whole cache. See my answer above. 2. Given these cache updates need to be published & subscribed in real time, what special handling should we do in cases the subscriber is busy (overloaded) and unable to read the message from the topic? If you're going with a distributed database, there's only one answer here: ensure that this doesn't happen. Make the insertion of a record dead simple to minimize the work done on that thread, make sure the process is hosted on a machine with more than enough processing power, even raise the priority of the ActiveMQ consumer thread to ensure that a glut of "real" work doesn't starve the consumer. If you're going with a distributed cache, then don't worry about this, because you'll handle it with queries to the truth store when you have cache misses (at the cost of slower performance). Tim