It sounds like your switch fabric might be the issue?

Those types of hangs should show pretty frequent kernel alarms.

On Jun 2, 2013, at 21:10, Christian Posta <christian.po...@gmail.com> wrote:

> You should checkout the failover transport to handle reconnecting.
> 
> On Sunday, June 2, 2013, fenbers wrote:
> 
>> 
>> 
>> 
>> 
>> 
>>    I don't know how to determine the NFS version but we are running on
>>    RHEL 5.5.
>> 
>>    I have not checked the syslog.&nbsp; Thanks for the tip.&nbsp; I will
>> do that
>>    after our morning Operations.
>> 
>>    We are also very inclined to believe this is an NFS issue, based on
>>    behaviors network-wide which have nothing to do with ActiveMQ, e.g,
>>    often taking 10 seconds to list just 5 files in an NFS-mounted
>>    directory.
>> 
>>    So, we are creating an action plan this weekend to eliminate as many
>>    NFS mount points as possible, and seeing how that helps the
>>    situation.&nbsp; The plan needs approval/buy-in from key people to be
>>    implemented, so it may be a couple of weeks to implement the
>> plan.&nbsp;
>>    In the meantime, ActiveMQ either shuts itself down or behaves in
>>    rather despondent ways, so we find we are having to restart ActiveMQ
>>    every 3 or 4 hours (and this frequency is slowly increasing).
>> 
>>    Once ActiveMQ is rebooted, we find that our producers and our
>>    consumers have to be shut down and relaunched in order to
>>    reestablish the connection with ActiveMQ.&nbsp; This is a royal
>> pain!&nbsp;
>>    However, a producer will throw an exception whenever it tries to
>>    send a message through a lost connection, and so I catch the
>>    exception where I close the connection and reopen it.&nbsp; Thus, my
>>    producers are able to reconnect automatically in the event ActiveMQ
>>    is restarted.
>> 
>>    But with the consumers, no exception is thrown as it waits for
>>    notifications.&nbsp; It simply waits for a notification that never
>>    happens after the connection with ActiveMQ is lost.&nbsp; So what is
>> your
>>    recommended method for a consumer to check for a disconnection??&nbsp;
>>    (Maybe I should post his question as a separate thread...)
>> 
>>    Mark
>> 
>> 
>>    On 5/29/2013 3:21 AM, rajdavies [via
>>      ActiveMQ] wrote:
>> 
>>     Ultimately I'm pretty confident this problem is an
>>      NFS problem &nbsp;- and as Johan has already let the cat out of the
>> bag
>>      ;) - let me ask the following:
>> 
>> 
>>      &nbsp;Which version of NFS 4 are you using and which environment?
>> 
>>      &nbsp;Have you checked the system logs for NFS errors on all the
>>      machines running ActiveMQ brokers ?
>> 
>> 
>>      thanks,
>> 
>> 
>>      Rob
>> 
>> 
>>      On 29 May 2013, at 00:46, Christian Posta &lt; [hidden email] &gt;
>>      wrote:
>> 
>> 
>>        &gt; I can make two recommendations.
>> 
>>        &gt;
>>        &gt; #1, being the preferred, create a test case that shows
>>        this... that will
>> 
>>        &gt; give us the best chance of finding out what's going on...
>>        take a look at
>> 
>>        &gt; the following test cases in the activemq source code to
>>        give you an idea
>> 
>>        &gt; about how to go about doing it...
>> 
>>        &gt;
>>        &gt;
>> http://svn.apache.org/viewvc/activemq/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/usecases/
>>        &gt;
>>        &gt;
>> http://svn.apache.org/viewvc/activemq/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/bugs/
>>        &gt;
>>        &gt;
>> http://svn.apache.org/viewvc/activemq/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/test/JmsTopicSendReceiveTest.java?view=markup
>>        &gt;
>>        &gt;
>>        &gt; #2, if creating a test case doesn't sound like something
>>        you want to get
>> 
>>        &gt; into.. i guess, give us the exact configs of broker,
>>        clients, number of
>> 
>>        &gt; consumers, number of topics, message sizes, etc, etc all
>>        details and if one
>> 
>>        &gt; of us gets the urge we can try it out on our boxes. this
>>        will not be nearly
>> 
>>        &gt; as good as #1, and will provide a higher barrier to entry
>>        because we spend
>> 
>>        &gt; our spare time doing this and like to spend that time
>>        debugging and fixing,
>> 
>>        &gt; and not setting up environments and usecases which may not
>>        even show a bug
>> 
>>        &gt; :)
>> 
>>        &gt;
>>        &gt;
>>        &gt;
>>        &gt;
>>        &gt; On Tue, May 28, 2013 at 4:34 PM, fenbers &lt; [hidden email]
>> &gt;
>>        wrote:
>> 
>>        &gt;
>>        &gt;&gt;
>>        &gt;&gt;
>>        &gt;&gt;
>>        &gt;&gt;
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp;I'm getting the Sync exception on both,
>> local and
>>        NFS.&amp;nbsp;
>> 
>>        &gt;&gt; Originally,
>> 
>>        &gt;&gt; &nbsp; &nbsp;I was only using a local disk, but there
>> wasn't much
>>        disk space for
>> 
>>        &gt;&gt; &nbsp; &nbsp;the ever growing list of 33MB enumerated
>> .log files
>>        that weren't
>> 
>>        &gt;&gt; &nbsp; &nbsp;cleaned up.&amp;nbsp; So I reconfigured
>> ActiveMQ to
>>        put these db files on
>> 
>>        &gt;&gt; an
>> 
>>        &gt;&gt; &nbsp; &nbsp;NFS mount.&amp;nbsp; But the sync exceptions
>>        occurred either way.
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp;I've changed *all* my consumers to
>> AUTO_ACKNOWLEDGE,
>>        thinking that
>> 
>>        &gt;&gt; &nbsp; &nbsp;maybe an ACKNOWLEDGEment leak was causing the
>>        undeleted files.&amp;nbsp;
>> 
>>        &gt;&gt; That
>> 
>>        &gt;&gt; &nbsp; &nbsp;didn't help...&amp;nbsp; The TRACE level
>> logging
>>        points to only two of my 5
>> 
>>        &gt;&gt; &nbsp; &nbsp;topics that accumulate these undeleted db
>>        files.&amp;nbsp; So I've
>> 
>>        &gt;&gt; &nbsp; &nbsp;concentrated by scrutiny over consumers of
>> these two
>>        topics.&amp;nbsp; But
>> 
>>        &gt;&gt; &nbsp; &nbsp;have not found anything out of the
>>        ordinary.&amp;nbsp;
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp;What is puzzling me still, is that the
>> frequency of
>>        the log file
>> 
>>        &gt;&gt; &nbsp; &nbsp;build-up and the frequency of exceptions
>> continues
>>        to increase even
>> 
>>        &gt;&gt; &nbsp; &nbsp;though the amount of messages sent per day
>> by the
>>        producers remains
>> 
>>        &gt;&gt; &nbsp; &nbsp;nearly constant...
>> 
>>        &gt;&gt; &nbsp; &nbsp;Mark
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp;On 5/28/2013 6:06 PM, ceposta [via
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp;ActiveMQ] wrote:
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; Sounds like there's multiple issues...
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp;You're journal files aren't being
>> cleaned up, AND
>>        you're getting
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp;the Sync
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp;exception?
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp;You get the sync exception on local
>> disk mount? Or
>>        just NFS?
>> 
>>        &gt;&gt;
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp;If the journals aren't being cleaned
>> up, are your
>>        consumers
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp;properly
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp;ack'ing messages?
>> 
>>        &gt;&gt;
>>        &gt;&gt;
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp;On Tue, May 28, 2013 at 2:42 PM,
>> fenbers &amp;lt;
>>        [hidden email] &amp;gt;
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp;wrote:
>> 
>>        &gt;&gt;
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> I would LOVE to
>>        help you help me!&amp;amp;nbsp; But
>> 
>>        &gt;&gt; I have
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;no idea how to go
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> about making a
>>        test case.&amp;amp;nbsp; If you
>> 
>>        &gt;&gt; could drop
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;some hints in this
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> regard, I might
>>        be able to produce one.
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> My ActiveMQ
>>        issues seem to be related to network
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;slowness, which we
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> are diagnosing
>>        separately.&amp;amp;nbsp; Or maybe
>> 
>>        &gt;&gt; it is the
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;other way around,
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> where ActiveMQ
>>        problems are causing network
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;sluggishness.&amp;amp;nbsp;
>> Either
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> way, there seems
>>        to be a correlation, except
>> 
>>        &gt;&gt; that when
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;network
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> responsiveness
>>        improves, ActiveMQ does not.
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> The problem I'm
>>        having with AMQ is progressive,
>> 
>>        &gt;&gt; which
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;is even more
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> puzzling, because
>>        we are not adding to the
>> 
>>        &gt;&gt; number of
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;messages that
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> AMQ has to
>>        handle.&amp;amp;nbsp; Today, we were up
>> 
>>        &gt;&gt; to 191
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;undeleted db-NNN.log
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> files in the
>>        database directory before I
>> 
>>        &gt;&gt; stopped AMQ
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;and deleted
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>>        them.&amp;amp;nbsp;&amp;amp;nbsp; NNN was up to 451, so
>> 
>>        &gt;&gt; 260
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;files had been cleaned up
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; by AMQ's
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> automatic
>>        processes...
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> Will log files
>>        assist you in helping
>> 
>>        &gt;&gt; me?&amp;amp;nbsp; I
>> 
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;have TRACE level
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt; &amp;nbsp; &amp;nbsp;
>> messages turned
>>        on, so they are quite large.
>> 
>>        &gt;&gt;
>>        &gt;&gt; &nbsp; &nbsp; &nbsp; &nbsp;&amp;gt;
>> 
>>        &gt;&gt;
>> 
>> <
> 
> 
> 
> -- 
> *Christian Posta*
> http://www.christianposta.com/blog
> twitter: @christianposta

Reply via email to