Re: Disk I/O requirement for persistent messages

Peter Hicks Fri, 22 Nov 2019 08:13:50 -0800

Hi Tim, Jean-Baptiste,

I've worked out what the issue is/was:


I have a Camel route which splits groups of 32 elements in a JSON array in
to 32 individual messages.  These are small and persistent and so written
to disk when there's a durable subscription in place, causing the high I/O
load.

Even if I have a consumer connected, those messages still appear to be
persisted to disk and then dequeued as they're read - and all the messages
are under 64 bytes.  The same happens if I use a queue rather than a topic
and durable subscription.  If I consume those messages straight off the
topic without a durable subscription, no problem.

I'm working toward building a test case for this so I can benchmark
strategies for dealing with it - I'll make this available when I've
finished it.


Peter


On Wed, 23 Oct 2019 at 05:51, Tim Bain <tb...@alumni.duke.edu> wrote:

> Peter,
>
> Three quick points to consider:
> 1. It's possible to provision additional IOPS on your EBS volumes to get
> more write throughput, which might be cheaper than EFS. You'd have to run
> the numbers for your particular use case, but it's worth evaluating.
> 2. There are several classes of EBS volume, some based on SSD and some
> based on HDD. Be sure you're using the SSD ones: gp2 or maybe io1 if you
> need the extra performance and the cost numbers work out.
> 3. EBS isn't local. Certain EC2 instance families provide
> physically-attached SSDs, which should be the fastest option available. So
> consider that if you can't get the performance you need out of EBS. But
> you're very vulnerable to data loss in hardware failure situations (or even
> just scaling up the host to a larger instance type), so to go this route
> you need a rock-solid plan for how to manage the data, especially since 5.x
> doesn't provide any means of replicating data between brokers with local
> storage.
>
> Overall, I'm surprised that the performance is as bad as you're describing,
> which sounds like a bug. Would you please write a Bug in JIRA for your
> findings, even if you eventually work around the problems?
>
> Thanks,
> Tim
>
> On Tue, Oct 22, 2019, 10:14 PM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
> > Hi,
> >
> > EFS works fine and it's really convenient if you want master/slave
> > setup. The only drawback is that it can costs a lot ;) (I don't if you
> > evaluate the price of using EFS). If you don't need master/slave, EFS is
> > not required (the local fs performance can be good enough depending the
> > kind of EC2 you are using).
> >
> > About KahaDB, with the use case you described, maybe you can try the
> > following configuration:
> >
> >         <persistenceAdapter>
> >             <kahaDB directory="/path/to/efs"
> >                 indexWriteBatchSize="1000"
> >                 indexCacheSize="2000"
> >                 journalMaxFileLength="32mb"
> >                 checkForCorruptJournalFiles="false"
> >                 maxAsyncJobs="5000"
> >                 concurrentStoreAndDispatchQueues="true"
> >                 concurrentStoreAndDispatchTopics="true"
> >                 enableJournalDiskSyncs="true"
> >                 enableIndexWriteAsync="true"/>
> >         </persistenceAdapter>
> >
> > I'm using PostgreSQL with brokers on EC2 today (in production), but even
> > adding some indexes (you can take a look on
> > https://issues.apache.org/jira/browse/AMQ-7008), it's an important
> > bottleneck.
> > I'm now evaluating different alternatives (what I have in mind is to
> > create a network of brokers dynamicOnly).
> > However, it depends a lot of your use case (persistent messages or not,
> > order of messages, etc).
> >
> > Don't hesitate to ping me if you want to discuss about that.
> >
> > Regards
> > JB
> >
> > On 22/10/2019 22:25, Peter Hicks wrote:
> > > Hi JB
> > >
> > > I was using a EBS (local) partition when I saw the horrendous I/O
> > > throughput.  During testing over the last hour or so, I've mounted an
> EFS
> > > (for those who don't grok Amazon AWS, Elastic File System: basically an
> > NFS
> > > mount) target and symlinked the kahadb directory over to it.
> > >
> > > My current KahaDB configuration is really nothing special:
> > >
> > >         <persistenceAdapter>
> > >             <kahaDB directory="${activemq.data}/kahadb"/>
> > >         </persistenceAdapter>
> > >
> > > However, the good news is that initial testing using EFS has reduced
> the
> > > load on the server substantially, and the other tasks - in particular,
> a
> > > Camel route which takes a JSON list, converts it to individual messages
> > and
> > > converts each back to a JSON - now run within 5ms, whereas before they
> > were
> > > taking upwards of 1200ms per message.
> > >
> > > I am building a cluster in development, and I'll look at upgrading to
> > > 5.15.9 or .10.  Using a JDBC store might be a better fit for me, as I
> > have
> > > plenty of spare capacity on a PostgreSQL server.  Is this likely to
> scale
> > > up to a few hundred messages a second, or is KahaDB a better way to go?
> > >
> > >
> > > Peter
> > >
> > >
> > > On Tue, 22 Oct 2019 at 20:20, Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> > >
> > >> Hi Peter,
> > >>
> > >> The most important is the I/O rate/throughput. I'm also using some
> > >> brokers on EC2 (some using JDBC store, some using kahadb store).
> > >>
> > >> What's the filesystem ? EFS or "local" EC2 ?
> > >>
> > >> What's your current kahadb configuration in activemq.xml ?
> > >>
> > >> Just a note: 5.15.9 got major improvements on kahadb that could help.
> > >>
> > >> Regards
> > >> JB
> > >>
> > >> On 22/10/2019 19:54, Peter Hicks wrote:
> > >>> All,
> > >>>
> > >>> I have a feed of 110 messages/second of about 150 bytes each which
> I'm
> > >>> routing through a default-settings ActiveMQ 5.15.8 server and sending
> > on
> > >> to
> > >>> a topic.  Everything works fine until I set up a durable
> subscription,
> > at
> > >>> which point iostat (Ubuntu 18.04LTS) reports about 300tps and about
> 2-3
> > >>> megabytes a second of disk writes, which seems like an awful lot of
> the
> > >>> message rate and size, and it's slowing down other processing on the
> > >>> server.  Is this normal and expected?
> > >>>
> > >>> The server is within Amazon EC2 and I can easily add an additional
> disk
> > >> for
> > >>> the KahaDB directory, but can anyone point me at a resource that will
> > >> help
> > >>> me reduce the I/O requirements of persisting all these messages to
> > >> disk?  I
> > >>> am open to any suggestions.
> > >>>
> > >>>
> > >>> Peter
> > >>>
> > >>
> > >> --
> > >> Jean-Baptiste Onofré
> > >> jbono...@apache.org
> > >> http://blog.nanthrax.net
> > >> Talend - http://www.talend.com
> > >>
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

-- 


OpenTrainTimes Ltd. registered in England and Wales, company no. 
09504022.
Registered office: 13a Davenant Road, Upper Holloway, London N19 
3NW

Re: Disk I/O requirement for persistent messages

Reply via email to