Re: Failover very slow with kahadb while restart of master is fast

Tim Bain Mon, 15 Feb 2016 06:15:36 -0800

You could use a profiler such as JVisualVM to tell you what work is being
done.  You could get some of that information from thread dumps via jstack,
but I think a profiler would give much more useful and detailed information
for less effort.


Tim
On Feb 15, 2016 5:37 AM, "Jean-Baptiste Onofré" <j...@nanthrax.net> wrote:

> Hi Christian,
>
> what's the option on the NFS ?
>
> Most of the time, I use I use sync and I increase the rsize and wsize.
>
> Anyway, the difference is huge. The problem could be related to the
> journal checksum check. Can you try checkForCorruptJournalFiles="false" on
> the kahadb configuration ?
>
> Maybe increasing the kahadb jobs could help too:
>
>             <kahaDB directory="${karaf.data}/activemq/activemq/kahadb"
>                 indexWriteBatchSize="1000"
>                 indexCacheSize="2000"
>                 journalMaxFileLength="32mb"
>                 checkForCorruptJournalFiles="false"
>                 maxAsyncJobs="5000"
>                 concurrentStoreAndDispatchQueues="true"
>                 concurrentStoreAndDispatchTopics="true"
>                 enableJournalDiskSyncs="true"
>                 enableIndexWriteAsync="true"/>
>
>
> Regards
> JB
>
> On 02/15/2016 01:32 PM, Christian Schneider wrote:
>
>> At a customer we are experiencing a very strange behaviour of activemq.
>>
>> We have two brokers on separate machines A and B. Storage is on a third
>> machine and imported using nfsv4.
>> Kahadb contains about 18 GB of journal files.
>>
>> Scenario 1:
>> - Stop B
>> - Stop A (master now)
>> - Start A
>>
>> The start of A takes about 50 seconds.
>>
>> Scnario 2:
>> - Stop A
>> - Stop B (master now)
>> - Start A
>>
>> The start of A takes about 5 minutes.
>>
>> In the slow case the log at debug level shows a big gap between adding
>> the last queue (12:04) and running the ActiveMQ Journal Checkpoint
>> Worker again after the queue creation (12:09).
>> I have no idea what activemq is doing in the mean time but it seems to
>> take several minutes.
>>
>> Can anyone explain this behaviour or explain what activemq does in the
>> mean time? I would also be happy about pointer to the code.
>> What I do not understand is why it makes a difference if the master is
>> started again or if the slave is becoming master. As they both share the
>> same kahadb over nfs I would expect the same startup
>> behaviour.
>>
>> Christian
>>
>> -----
>>
>> Slow start
>> 2016-02-15 12:03:01,943 | INFO  | Refreshing
>> org.apache.activemq.xbean.XBeanBrokerFactory$1@17b14695: startup date
>> [Mon Feb 15 12:03:01 CET 2016]; root of context hierarchy |
>> org.apache.activemq.xbean.XBeanBrokerFactory$1 | main
>> ...
>> 2016-02-15 12:03:03,018 | INFO  |
>> PListStore:[/path/activemq/data/A/tmp_storage] started |
>> org.apache.activemq.store.kahadb.plist.PListStoreImpl | main
>> 2016-02-15 12:03:03,033 | INFO  | ignoring zero length, partially
>> initialised journal data file: db-1.log number = 1 , length = 0 |
>> org.apache.activemq.store.kahadb.disk.journal.Journal | main
>> 2016-02-15 12:03:03,177 | INFO  | JobSchedulerStore:path/scheduler
>> started |
>> org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl | main
>> 2016-02-15 12:03:03,180 | INFO  | JobScheduler using directory:
>> path/scheduler | org.apache.activemq.broker.BrokerService | main
>> 2016-02-15 12:03:03,335 | INFO  | Using Persistence Adapter:
>> KahaDBPersistenceAdapter[/path] |
>> org.apache.activemq.broker.BrokerService | main
>> 2016-02-15 12:03:03,350 | INFO  | JMX consoles can connect to
>> service:jmx:rmi:///jndi/rmi://localhost:1617/jmxrmi |
>> org.apache.activemq.broker.jmx.ManagementContext | JMX connector
>> 2016-02-15 12:03:05,198 | INFO  | Corrupt journal records found in
>> 'path/db-1122.log' between offsets: 17893102..17893732 |
>> org.apache.activemq.store.kahadb.disk.journal.Journal | main
>> 2016-02-15 12:03:47,925 | INFO  | KahaDB is version 5 |
>> org.apache.activemq.store.kahadb.MessageDatabase | main
>> ....
>> 2016-02-15 12:03:49,233 | INFO  | Recovering from the journal ... |
>> org.apache.activemq.store.kahadb.MessageDatabase | main
>> 2016-02-15 12:03:49,237 | INFO  | Recovery replayed 20 operations from
>> the journal in 0.094 seconds. |
>> org.apache.activemq.store.kahadb.MessageDatabase | main
>> 2016-02-15 12:04:26,863 | DEBUG | Checkpoint started. |
>> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
>> Checkpoint Worker
>> 2016-02-15 12:04:26,932 | DEBUG | Checkpoint done. |
>> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
>> Checkpoint Worker
>> 2016-02-15 12:04:26,940 | INFO  | Apache ActiveMQ 5.10.2 (A,
>> ID:A-59167-1455534183070-1:1) is starting |
>> org.apache.activemq.broker.BrokerService | main
>> ...
>> 2016-02-15 12:04:30,289 | DEBUG | A adding destination: queue://q1.DLQ |
>> org.apache.activemq.broker.region.AbstractRegion | main
>> 2016-02-15 12:04:57,026 | DEBUG | queue://q1.DLQ expiring messages .. |
>> org.apache.activemq.broker.region.Queue | ActiveMQ Broker[A] Scheduler
>> 2016-02-15 12:04:57,029 | DEBUG | q1.DLQ toPageIn: 5, Inflight: 0,
>> pagedInMessages.size 0, pagedInPendingDispatch.size 0, enqueueCount: 0,
>> dequeueCount: 0, memUsage:0 | org.apache.activemq.broker.region.Queue |
>> ActiveMQ Broker[A] Scheduler
>> 2016-02-15 12:09:20,440 | DEBUG | Checkpoint started. |
>> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
>> Checkpoint Worker
>> 2016-02-15 12:09:20,476 | DEBUG | Checkpoint done. |
>> org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal
>> Checkpoint Worker
>> 2016-02-15 12:09:20,488 | DEBUG | queue://q1.DLQ expiring messages done.
>> | org.apache.activemq.broker.region.Queue | ActiveMQ Broker[A] Scheduler
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Failover very slow with kahadb while restart of master is fast

Reply via email to