I don't know how to determine the NFS version but we are running on RHEL 5.5. I have not checked the syslog. Thanks for the tip. I will do that after our morning Operations. We are also very inclined to believe this is an NFS issue, based on behaviors network-wide which have nothing to do with ActiveMQ, e.g, often taking 10 seconds to list just 5 files in an NFS-mounted directory. So, we are creating an action plan this weekend to eliminate as many NFS mount points as possible, and seeing how that helps the situation. The plan needs approval/buy-in from key people to be implemented, so it may be a couple of weeks to implement the plan. In the meantime, ActiveMQ either shuts itself down or behaves in rather despondent ways, so we find we are having to restart ActiveMQ every 3 or 4 hours (and this frequency is slowly increasing). Once ActiveMQ is rebooted, we find that our producers and our consumers have to be shut down and relaunched in order to reestablish the connection with ActiveMQ. This is a royal pain! However, a producer will throw an exception whenever it tries to send a message through a lost connection, and so I catch the exception where I close the connection and reopen it. Thus, my producers are able to reconnect automatically in the event ActiveMQ is restarted. But with the consumers, no exception is thrown as it waits for notifications. It simply waits for a notification that never happens after the connection with ActiveMQ is lost. So what is your recommended method for a consumer to check for a disconnection?? (Maybe I should post his question as a separate thread...) Mark On 5/29/2013 3:21 AM, rajdavies [via ActiveMQ] wrote: Ultimately I'm pretty confident this problem is an NFS problem - and as Johan has already let the cat out of the bag ;) - let me ask the following: Which version of NFS 4 are you using and which environment? Have you checked the system logs for NFS errors on all the machines running ActiveMQ brokers ? thanks, Rob On 29 May 2013, at 00:46, Christian Posta < [hidden email] > wrote: > I can make two recommendations. > > #1, being the preferred, create a test case that shows this... that will > give us the best chance of finding out what's going on... take a look at > the following test cases in the activemq source code to give you an idea > about how to go about doing it... > > http://svn.apache.org/viewvc/activemq/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/usecases/ > > http://svn.apache.org/viewvc/activemq/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/bugs/ > > http://svn.apache.org/viewvc/activemq/trunk/activemq-unit-tests/src/test/java/org/apache/activemq/test/JmsTopicSendReceiveTest.java?view=markup > > > #2, if creating a test case doesn't sound like something you want to get > into.. i guess, give us the exact configs of broker, clients, number of > consumers, number of topics, message sizes, etc, etc all details and if one > of us gets the urge we can try it out on our boxes. this will not be nearly > as good as #1, and will provide a higher barrier to entry because we spend > our spare time doing this and like to spend that time debugging and fixing, > and not setting up environments and usecases which may not even show a bug > :) > > > > > On Tue, May 28, 2013 at 4:34 PM, fenbers < [hidden email] > wrote: > >> >> >> >> >> >> I'm getting the Sync exception on both, local and NFS.&nbsp; >> Originally, >> I was only using a local disk, but there wasn't much disk space for >> the ever growing list of 33MB enumerated .log files that weren't >> cleaned up.&nbsp; So I reconfigured ActiveMQ to put these db files on >> an >> NFS mount.&nbsp; But the sync exceptions occurred either way. >> >> I've changed *all* my consumers to AUTO_ACKNOWLEDGE, thinking that >> maybe an ACKNOWLEDGEment leak was causing the undeleted files.&nbsp; >> That >> didn't help...&nbsp; The TRACE level logging points to only two of my 5 >> topics that accumulate these undeleted db files.&nbsp; So I've >> concentrated by scrutiny over consumers of these two topics.&nbsp; But >> have not found anything out of the ordinary.&nbsp; >> >> What is puzzling me still, is that the frequency of the log file >> build-up and the frequency of exceptions continues to increase even >> though the amount of messages sent per day by the producers remains >> nearly constant... >> Mark >> >> On 5/28/2013 6:06 PM, ceposta [via >> ActiveMQ] wrote: >> >> Sounds like there's multiple issues... >> >> You're journal files aren't being cleaned up, AND you're getting >> the Sync >> >> exception? >> >> You get the sync exception on local disk mount? Or just NFS? >> >> >> If the journals aren't being cleaned up, are your consumers >> properly >> >> ack'ing messages? >> >> >> >> On Tue, May 28, 2013 at 2:42 PM, fenbers &lt; [hidden email] &gt; >> wrote: >> >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; &nbsp; &nbsp; I would LOVE to help you help me!&amp;nbsp; But >> I have >> no idea how to go >> >> &gt; &nbsp; &nbsp; about making a test case.&amp;nbsp; If you >> could drop >> some hints in this >> >> &gt; &nbsp; &nbsp; regard, I might be able to produce one. >> >> &gt; >> >> &gt; &nbsp; &nbsp; My ActiveMQ issues seem to be related to network >> slowness, which we >> >> &gt; &nbsp; &nbsp; are diagnosing separately.&amp;nbsp; Or maybe >> it is the >> other way around, >> >> &gt; &nbsp; &nbsp; where ActiveMQ problems are causing network >> sluggishness.&amp;nbsp; Either >> >> &gt; &nbsp; &nbsp; way, there seems to be a correlation, except >> that when >> network >> >> &gt; &nbsp; &nbsp; responsiveness improves, ActiveMQ does not. >> >> &gt; >> >> &gt; &nbsp; &nbsp; The problem I'm having with AMQ is progressive, >> which >> is even more >> >> &gt; &nbsp; &nbsp; puzzling, because we are not adding to the >> number of >> messages that >> >> &gt; &nbsp; &nbsp; AMQ has to handle.&amp;nbsp; Today, we were up >> to 191 >> undeleted db-NNN.log >> >> &gt; &nbsp; &nbsp; files in the database directory before I >> stopped AMQ >> and deleted >> >> &gt; &nbsp; &nbsp; them.&amp;nbsp;&amp;nbsp; NNN was up to 451, so >> 260 >> files had been cleaned up >> >> &gt; by AMQ's >> >> &gt; &nbsp; &nbsp; automatic processes... >> >> &gt; >> >> &gt; &nbsp; &nbsp; Will log files assist you in helping >> me?&amp;nbsp; I >> have TRACE level >> >> &gt; &nbsp; &nbsp; messages turned on, so they are quite large. >> >> &gt; >> >> &gt; &nbsp; &nbsp; Mark >> >> &gt; >> >> &gt; &nbsp; &nbsp; On 5/28/2013 5:22 PM, rajdavies [via >> >> &gt; &nbsp; &nbsp; &nbsp; ActiveMQ] wrote: >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp;Hi Mark, >> >> &gt; >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; could you produce a test case for your >> problem - it >> would help us >> >> &gt; &nbsp; &nbsp; &nbsp; identify the problem a lot quicker >> >> &gt; >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; thanks, >> >> &gt; >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; Rob >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; On 30 Apr 2013, at 16:40, fenbers >> &amp;lt; [hidden >> email] &amp;gt; >> >> &gt; &nbsp; &nbsp; &nbsp; wrote: >> >> &gt; >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; Zagan wrote >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt;&amp;gt; Can you please >> check if your .log >> files in the /data >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; directory are cleaned >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt;&amp;gt; up? On basis of >> the information I >> suppose this >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; behaviour is due to a >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt;&amp;gt; misconfiguration >> of your clients. >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt;&amp;gt; If this is the >> case often broken >> log file cleanup is a >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; symptom. >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; I get the same error as >> brought up in this >> thread (KahaDB >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; failed to store to >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; Journal). &amp;nbsp;And >> yes, I also have a >> problem with the >> >> &gt; numbered >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; .log files not >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; all getting cleaned up >> (most files are >> removed >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; appropriately). &amp;nbsp;I have >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; suspected a client >> configuration problem >> for a long time, >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; but can't figure >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; out what's wrong -- even >> with TRACE >> logging turned on. >> >> &gt; &amp;nbsp;In >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; the meantime, I >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; have to cope with >> ActiveMQ crashing (i.e., >> shutting itself >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; down) about every >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; two days. &amp;nbsp;The >> logs point to a >> disk storage problem, but >> >> &gt; I >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; have plenty of >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; space, so that's not the >> issue! >> &amp;nbsp;I've tried a couple of >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; different Linux >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; boxes and both local and >> NFS mounts, and >> this issue occurs >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; on both of them. >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; I'm at a loss!! >> &amp;nbsp;I'm running >> 5.8.0... >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; Mark >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; -- >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; View this message in >> context: >> >> &gt; >> http://activemq.2283324.n4.nabble.com/ActiveMQ-crashes-frequently-tp4305407p4666469.html >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &amp;gt; Sent from the ActiveMQ - >> User mailing list >> archive at >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; Nabble.com. >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; If you reply to this email, your >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; message will be added to >> the discussion below: >> >> &gt; >> >> &gt; >> http://activemq.2283324.n4.nabble.com/ActiveMQ-crashes-frequently-tp4305407p4667572.html >> &gt; >> >> &gt; >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; To unsubscribe from ActiveMQ >> crashes frequently, >> click >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; here . >> >> &gt; &nbsp; &nbsp; &nbsp; &nbsp; NAML >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; mark_fenbers.vcf (360 bytes) &lt; >> >> &gt; >> http://activemq.2283324.n4.nabble.com/attachment/4667574/0/mark_fenbers.vcf >> &gt; &gt; >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; >> >> &gt; -- >> >> &gt; View this message in context: >> >> &gt; >> http://activemq.2283324.n4.nabble.com/ActiveMQ-crashes-frequently-tp4305407p4667574.html >> &gt; Sent from the ActiveMQ - User mailing list archive at >> Nabble.com. >> >> &gt; >> >> >> >> >> -- >> *Christian Posta* >> >> http://www.christianposta.com/blog >> twitter: @christianposta >> >> http://www.christianposta.com/blog >> >> >> >> >> >> If you reply to this email, your >> message will be added to the discussion below: >> >> http://activemq.2283324.n4.nabble.com/ActiveMQ-crashes-frequently-tp4305407p4667575.html >> >> >> To unsubscribe from ActiveMQ crashes frequently, click >> here . >> NAML >> >> >> >> >> >> >> mark_fenbers.vcf (360 bytes) < >> http://activemq.2283324.n4.nabble.com/attachment/4667583/0/mark_fenbers.vcf >>> >> >> >> >> >> -- >> View this message in context: >> http://activemq.2283324.n4.nabble.com/ActiveMQ-crashes-frequently-tp4305407p4667583.html >> Sent from the ActiveMQ - User mailing list archive at Nabble.com. >> > > > > -- > *Christian Posta* > http://www.christianposta.com/blog > twitter: @christianposta If you reply to this email, your message will be added to the discussion below: http://activemq.2283324.n4.nabble.com/ActiveMQ-crashes-frequently-tp4305407p4667591.html To unsubscribe from ActiveMQ crashes frequently, click here . NAML
mark_fenbers.vcf (360 bytes) <http://activemq.2283324.n4.nabble.com/attachment/4667732/0/mark_fenbers.vcf> -- View this message in context: http://activemq.2283324.n4.nabble.com/ActiveMQ-crashes-frequently-tp4305407p4667732.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.