Re: ActiveMQ 5.10.0 queue slowed down, restart helped

Tim Bain Sun, 26 Apr 2015 07:50:59 -0700

Mo,

Sorry for the long delay in getting back to you; maybe you've already
figured out your problem, but if not hopefully this will help.


My understanding of how RDBMSes use indices is based on Oracle, so take
this with a grain of salt since it might not apply to PostgreSQL.

As I understand it, most RDBMSes will only pick a single index (the one
they think will result in the most efficient query execution) for a simple
SELECT query like yours; the index will be used to identify rows matching
as many of the criteria in the query as possible, and then those rows will
be loaded to see if they match any criteria that can't be evaluated via the
index and to retrieve any values in the SELECT clause that aren't in the
index.  Using an index that contains only some of the columns in your
SELECT clause is better than doing a full-table scan, but it's not great
because you load lots of rows that you don't need to load.  If all of the
columns in your SELECT and WHERE clauses are in the index, it can skip the
row retrieval entirely (which is the fastest scenario); if all of the
columns in your SELECT clause are in the index, it will only do retrievals
of the rows that actually match (which is still pretty fast and should be
your goal).

In your case, you have indices on CONTAINER and on (MSGID_SEQ, MSGID_PROD),
but no index that covers all three together.  I believe that your
performance should get better if you add that additional index (or modify
one of those existing indices to add the missing fields; you'll have to
evaluate which approach is better for your needs).

Tim

On Mon, Feb 23, 2015 at 7:48 AM, mo <mark.schm...@intratop.de> wrote:

> Hi Tim,
>
> thanks for taking an interest.
>
> This is the table's description:
>
> amq=> \d activemq_msgs
>           Tabelle „public.activemq_msgs“
>     Spalte   |          Typ           | Attribute
> ------------+------------------------+-----------
>   id         | bigint                 | not null
>   container  | character varying(250) |
>   msgid_prod | character varying(250) |
>   msgid_seq  | bigint                 |
>   expiration | bigint                 |
>   msg        | bytea                  |
>   priority   | bigint                 |
>   xid        | character varying(250) |
> Indexe:
>      "activemq_msgs_pkey" PRIMARY KEY, btree (id)
>      "activemq_msgs_cidx" btree (container)
>      "activemq_msgs_eidx" btree (expiration)
>      "activemq_msgs_idx" btree (msgid_prod)
>      "activemq_msgs_midx" btree (msgid_prod, msgid_seq)
>      "activemq_msgs_pidx" btree (priority)
>      "activemq_msgs_xidx" btree (xid)
>
> Running an explain I get...
>
> amq=> explain SELECT ID, PRIORITY FROM ACTIVEMQ_MSGS WHERE
> MSGID_PROD='ID:tomcat10-XXX-41356-1422538681150-1:95156:1:1' AND
> MSGID_SEQ='1' AND  CONTAINER='queue://XXX_export';
>                                                         QUERY PLAN
>
>
> ------------------------------------------------------------------------------------------------------------------------
>   Index Scan using activemq_msgs_cidx on activemq_msgs  (cost=0.42..8.45
> rows=1 width=16)
>     Index Cond: ((container)::text = 'queue://XXX_export'::text)
>     Filter: (((msgid_prod)::text =
> 'ID:tomcat10-XXX-41356-1422538681150-1:95156:1:1'::text) AND (msgid_seq
> = 1::bigint))
> (3 Zeilen)
>
> I think the Filter here could be problematic. Though I'm not sure why it
> is not using activemq_msgs_idx or activemq_msgs_midx.
>
> When I issue the same type of query against the database while having a
> slow-down I get similarly slow results as does the activemq process.
> However, restarting the activemq and then issueing the same type of
> query (of course changing some parameters so no caching occurs) we see
> very fast responses.
>
> On the database we always see 100% cpu usage on one core, by one
> process. There's no I/O issue as far as I can tell.
>
> One more hint: We have two queues that usually get very big during these
> slow-downs, and the responses of the above statements scale roughly
> linearly to their size. Just to give you an idea .. queue "R" might have
> 3000 messages and 3 seconds per above statement queue "B" might have
> 2000 messages and about 2 seconds per above statement. So it does look
> very much like the filter is the issue .. but the thing still throwing
> me off is simply that an activemq-restart fixes the issue. After that,
> the very same statements run fast.
>
> best regards,
> Mark
>
>
>
> On 02/23/2015 02:49 PM, Tim Bain [via ActiveMQ] wrote:
> > Mark,
> >
> > You say the indices are OK; can you describe them for us, and can you
> find
> > out the execution plan for the query?  Also, if you issue the same query
> > directly against the database when this is happening, is that also slow?
> > I'm looking for whether the query itself is slow or the query is fast but
> > the surrounding ActiveMQ code is slow.
> >
> > Also, have you looked to see if any computing resources (CPU, disk I/O,
> > network I/O, etc.) are heavily taxed on any of the machines involved (the
> > broker and the database server; any others?)?  Getting an idea of the
> > limiting resource might help figure out the problem.
> >
> > Tim
> > On Feb 17, 2015 6:08 AM, "Mark Schmitt | Intratop" <[hidden email]
> > </user/SendEmail.jtp?type=node&node=4691891&i=0>>
> > wrote:
> >
> >  > Hi,
> >  >
> >  > I work with Piotr on this issue. Let me try to provide some additional
> >  > information on our slow-down issue:
> >  >
> >  > Storage is a PostgreSQL Server 9.3.2 on a Debian Wheezy / Kernel
> > 3.2.51-1
> >  > System.
> >  >
> >  > We use JDBC and the PGPoolingDataSource
> >  > (org.postgresql.ds.PGPoolingDataSource).
> >  >
> >  > This is the persistenceAdapter configuration:
> >  >         <persistenceAdapter>
> >  >             <jdbcPersistenceAdapter dataDirectory="activemq-data"
> >  > dataSource="#postgres-ds" lockKeepAlivePeriod="0"
> >  > createTablesOnStartup="false" />
> >  >         </persistenceAdapter>
> >  >
> >  > We have 2 destination interceptors setup. And we run the demo code
> >  > (jetty-demo) because we have some applications using the http/rest
> >  > interface it provides. We don't run camel.
> >  >
> >  > Other than that it's a pretty mondane setup. And we also run two
> > instances
> >  > at the same time as a sort of fail-over. Because of the jdbc-backend,
> > only
> >  > one of them is active, and we use the failover protocol on clientside
> to
> >  > use the active one. We use haproxy to serve the webinterface from the
> >  > active instance. Both activemq-instances run on the same linux box,
> with
> >  > different service ip-adresses. (they use the same binaries, only
> >  > configuration and data directory are separated). The reason we run two
> >  > instances is that we had big stability issues before, with the
> activemq
> >  > process sort-of-hanging
> >  > itself up. We could move away from that setup, because with 5.10 this
> >  > hasn't happened.
> >  >
> >  > Like the database server, the linux box that runs the activemq
> > instance is
> >  > a Debian Wheezy Linux, but with Kernel 3.2.60-1+deb7u1.
> >  >
> >  > Problem description: Once in a while we see 100% cpu load on the
> > database.
> >  > We can isolate that to sql statements of the style:
> >  >
> >  > SELECT ID, PRIORITY FROM ACTIVEMQ_MSGS WHERE
> > MSGID_PROD='ID:tomcat10-XXX-
> >  > 41356-1422538681150-1:95156:1:1' AND MSGID_SEQ='1' AND
> >  > CONTAINER='queue://XXX_export'
> >  >
> >  > These sql statements take more than 500ms. We've had scenarios where
> > they
> >  > took more than 3 seconds to complete. Queuesize for 500ms was ~1200
> >  > messages for all queues (concentrated in one queue). With a
> > production of
> >  > about 2-3 Messages per seconds and a consumption of about 2 messages
> per
> >  > second. Imho the queuesize and the query-time scales linearly.
> >  >
> >  > We were able to "resolve" the issue by restarting both activemq
> > instances.
> >  > After that, the load on the database drops dramatically, instead of
> 100%
> >  > cpu usage we see less than 10% on the database and a very fast
> recovery.
> >  > The ActiveMQ-Processes look fine too.
> >  >
> >  > My first quess was a missing database index, but they look fine.
> > Besides,
> >  > restarting the activemq instances resolves the issue .. which is very
> > very
> >  > weired for me .. I don't think it's a database lock either, because we
> >  > couldn't see any and additionally, we see 100% cpu usage for the
> process
> >  > executing the statement (postgres spawns a process per statement).
> That
> >  > should imho (but I'm no database expect) not happen as well when
> > there's a
> >  > lock situation...
> >  >
> >  > We're at a loss. Do you guys have an idea?
> >  >
> >  > And one more thing: Once every two or three hours a lot of (several
> >  > thousand) messages are created. But the above described problem is
> >  > happening irregularly, every one or two weeks or so.
> >  >
> >  > Best regards,
> >  > Mark
> >  >
> >
> >
> > ------------------------------------------------------------------------
> > If you reply to this email, your message will be added to the discussion
> > below:
> >
> http://activemq.2283324.n4.nabble.com/ActiveMQ-5-10-0-queue-slowed-down-restart-helped-tp4690706p4691891.html
> >
> > To unsubscribe from ActiveMQ 5.10.0 queue slowed down, restart helped,
> > click here
> > <
> http://activemq.2283324.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4690706&code=bWFyay5zY2htaXR0QGludHJhdG9wLmRlfDQ2OTA3MDZ8NTEzMTIwMDQ5
> >.
> > NAML
> > <
> http://activemq.2283324.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
> --
> Mit freundlichen Grüßen
>
> Mark Schmitt
> --
> intratop UG (haftungsbeschränkt)
>
> Lise-Meitner-Straße 9
> 89081 Ulm
>
> Telefon: +49-731-146603-70
> Durchwahl: +49-731-146603-79
> Telefax: +49-731-146603-72
>
> E-Mail: mark.schm...@intratop.de
>
> Vertreten durch: Herr Mark Oliver Schmitt
>
>
> Registereintrag:
> Eintragung im Handelsregister.
> Registergericht: Amtsgericht Ulm
> Registernummer: HRB 727676
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/ActiveMQ-5-10-0-queue-slowed-down-restart-helped-tp4690706p4691897.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: ActiveMQ 5.10.0 queue slowed down, restart helped

Reply via email to