On Mon, Apr 23, 2018 at 11:48 PM, pragmaticjdev <amits...@gmail.com> wrote:

> Highly appreciate the detailed replies and the clarifications on
> distributed
> cache vs database.
> We are trying to build a distributed cache. I agree to all the inputs you
> shared for such a cache implementation. In summary it would mean
>         1. Subscriber should clear the cache when it cannot connect to the
> broker
>         2. Publisher should not rollback the database transaction on
> failures as
> step #1 would be sufficient & the cache is loaded as and when queried
>


OK, if you're building a cache rather than a database, what serves as the
record copy of the data, from which the cache can be queried on demand?
Would you just query your OLTP database whenever you have a cache miss?

Also, my earlier responses weren't considering the possibility that you'd
be updating objects within the cache. In that case, I agree that clearing
the cache on disconnection/error is the right way to go. Sorry if that
caused any confusion.



> A couple of follow up questions
> 1.
>
> > Typically you would call it in a tight loop, so you're only as stale as
> > the amount of time it takes you to publish the messages received the last
> > time.
>
>
> How can one control the polling time of the consumer? My jms consumer code
> from our spring application looks like this
>
> @Component
> public class Consumer {
>
>           @JmsListener(destination = "java-topic", containerFactory =
> "topicListenerFactory")
>         public void receiveTopicMessage(@Payload Person person) throws
> JMSException
> {
>             //update the local cache entry
>            }
> }
>
> How do I change the above code to call it in a tight loop?



Ah, so you're using message-driven code rather than managing the connection
yourself. In that case you'd want to do the following to handle the error
(by clearing the cache):
https://stackoverflow.com/questions/40654586/spring-jms-set-errorhandler-for-jmslistener-annotated-method

You certainly could switch to explicitly managing the connection (see
http://activemq.apache.org/hello-world.html as an example of what that code
would look like), but that's not necessary if you'd rather use the
message-driven paradigm.



> Also would that
> mean one or more threads would be constantly busy leading to constant usage
> of CPU cycles?
>


If you were to switch to accessing the connection directly, you'd typically
include a small Thread.sleep() to prevent spin-waiting. I apologize if the
choice of the words "tight loop" implied spin-waiting; I just meant that
you would keep the sleeps relatively short, not that there wouldn't be any
at all.



> 2.
> For my question on overloaded subscriber I didn't completely follow your
> suggestion for not being worried about this scenario. You mentioned
>
>
> >  If you're going with a distributed cache, then don't worry about
> > this, because you'll handle it with queries to the truth store when you
> > have cache misses (at the cost of slower performance).
>
> Assume there are two app servers with an object loaded in the local cache.
> An update to this object occurs on app server 1 which publishes that object
> on the jms queue. Here if app server 2 is overloaded (busy CPU), the jms
> consumer thread might not get a chance to execute at that instance in time.
> What happens in such cases, does activemq retry after some time?



In that scenario, your fate is in the hands of the JRE's thread scheduler.
There's no retrying at the application level; the thread simply sits there
with its execution pointer set to the next operation to be done, but it
might take time (milliseconds, not minutes) until the JRE decides that this
particular thread should be allowed to run its operations.

With that said, if the correct operation of your system depends on the
cache being updated before the subsequent operation is evaluated (i.e.
multi-process synchronization), then an asynchronous cache based on
ActiveMQ is not what you want, and you need to be hitting a
(non-distributed) database such as a SQL-based RDBMS for all of your
operations. Distributed systems have a certain amount of unpredictability
for operations that are run in parallel, so if your use case can't support
this, then you need an external data store such as an RDBMS to enforce
ordering/synchronization. I haven't been able to tell from what you've
written if this is your situation or not, but make sure you're clear on the
tradeoffs of a distributed/parallel architecture like the one you're
proposing, and make sure you can accept those tradeoffs.



> Can the
> number of such retries be configured? It could so happen that the app
> server
> 2 could remain in an overloaded state for a longer duration (may be 30
> mins).
>


Being "overloaded" doesn't mean your threads won't get to run, unless
you've done something dumb like setting that thread's priority lower than
the other threads in the application. In the default configuration, all
threads will be scheduled more or less evenly, so they'll make progress,
just not as fast as they would if the box was idle. There's nothing to
worry about here, unless you can't accept the inherent unpredictability of
a distributed system (see previous paragraphs).

Tim

Reply via email to