Re: possibly a Clojure question or possibly an AWS question: slow writes to durable-queue

Nathan Fisher Fri, 13 Oct 2017 03:04:03 -0700

It sounds like you have a memory leak. I would look at addressing that
before any performance tricks.
On Fri, 13 Oct 2017 at 05:35, <lawrence.krub...@gmail.com> wrote:


> Following Daniel Compton's suggestion, I turned on logging for GC. I don't
> see it happening more often, but the slow down does seem related to the
> moment when the app hits the maximum memory allowed. It had been running
> with 4G, so I increased that to 7G, so it goes longer now before it hits
> 98% memory usage, but it does hit it eventually and then everything crawls
> to a very slow speed. Not sure how much memory I would have to use to avoid
> using up almost all of the memory. I suppose I'll figure that out via trial
> and error. Until I can figure that out, nearly all other performance tricks
> seems a bit besides the point.
>
>
>
> On Thursday, October 12, 2017 at 9:01:23 PM UTC-4, Nathan Fisher wrote:
>
>> Hi!
>>
>> Can you change one of the variables? Specifically can you replicate this
>> on your local machine? If it happens locally then I would focus on
>> something in the JVM eco-system.
>>
>> If you can't replicate it locally then it's possibly AWS specific. It
>> sounds like you're using a t2.large or m4.xlarge. If it's the prior you may
>> very well be contending between with your network bandwidth. EC2's host
>> drive (EBS) is a networked drive which is split between your standard
>> network traffic and the drive volume. If that's the issue then you might
>> need to look at provisioned IOPs. A quick(ish) way to test that hypothesis
>> is to provision a host with high networking performance and provisioned
>> IOPs.
>>
>> Cheers,
>> Nathan
>>
>> On Fri, 13 Oct 2017 at 00:05 <lawrence...@gmail.com> wrote:
>>
>>> Daniel Compton, good suggestion. I've increased the memory to see if I
>>> can postpone the GCs, and I'll log that more carefully.
>>>
>>>
>>> On Wednesday, October 11, 2017 at 8:35:44 PM UTC-4, Daniel Compton wrote:
>>>
>>>> Without more information it's hard to tell, but this looks a like it
>>>> could be a garbage collection issue. Can you run your test again and add
>>>> some logging/monitoring to show each garbage collection? If my hunch is
>>>> right, you'll see garbage collections getting more and more frequent until
>>>> they take up nearly all the CPU time, preventing much forward progress
>>>> writing to the queue.
>>>>
>>>> If it's AWS based throttling, then CloudWatch monitoring
>>>> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-volume-status.html#using_cloudwatch_ebs
>>>>  might
>>>> show you some hints. You could also test with an NVMe drive attached, just
>>>> to see if disk bandwidth is the issue.
>>>>
>>>> On Thu, Oct 12, 2017 at 11:58 AM Justin Smith <noise...@gmail.com>
>>>> wrote:
>>>>
>>> a small thing here, if memory usage is important you should be building
>>>>> and running an uberjar instead of using lein on the server (this also has
>>>>> other benefits), and if you are doing that your project.clj jvm-opts are
>>>>> not used, you have to configure your java command line in aws instead
>>>>>
>>>>> On Wed, Oct 11, 2017 at 3:52 PM <lawrence...@gmail.com> wrote:
>>>>>
>>>> I can't figure out if this is a Clojure question or an AWS question.
>>>>>> And if it is a Clojure question, I can't figure out if it is more of a
>>>>>> general JVM question, or if it is specific to some library such as
>>>>>> durable-queue. I can redirect my question elsewhere, if people think this
>>>>>> is an AWS question.
>>>>>>
>>>>>> In my project.clj, I try to give my app a lot of memory:
>>>>>>
>>>>>>   :jvm-opts ["-Xms7g" "-Xmx7g" "-XX:-UseCompressedOops"])
>>>>>>
>>>>>> And the app starts off pulling data from MySQL and writing it to
>>>>>> Durable-Queue at a rapid rate. (
>>>>>> https://github.com/Factual/durable-queue )
>>>>>>
>>>>>> I have some logging set up to report every 30 seconds.
>>>>>>
>>>>>> :enqueued 370137,
>>>>>>
>>>>>> 30 seconds later:
>>>>>>
>>>>>> :enqueued 608967,
>>>>>>
>>>>>> 30 seconds later:
>>>>>>
>>>>>> :enqueued 828950,
>>>>>>
>>>>>> It's a dramatic slowdown. The app is initially writing to the queue
>>>>>> at faster than 10,000 documents a second, but it slows steadily, and 
>>>>>> after
>>>>>> 10 minutes it writes less than 1,000 documents per second. Since I have 
>>>>>> to
>>>>>> write a few million documents, 10,000 a second is the slowest speed I can
>>>>>> live with.
>>>>>>
>>>>>> The queues are in the /tmp folder of an AWS instance that has plenty
>>>>>> of disk space, 4 CPUs, and 16 gigs of RAM.
>>>>>>
>>>>>> Why does the app slow down so much? I had 4 thoughts:
>>>>>>
>>>>>> 1.) the app struggles as it hits a memory limit
>>>>>>
>>>>>> 2.) memory bandwidth is the problem
>>>>>>
>>>>>> 3.) AWS is enforcing some weird IOPS limit
>>>>>>
>>>>>> 4.) durable-queue is misbehaving
>>>>>>
>>>>>> As to possibility #1, I notice the app starts like this:
>>>>>>
>>>>>> Memory in use (percentage/used/max-heap): (\"66%\" \"2373M\"
>>>>>> \"3568M\")
>>>>>>
>>>>>> but 60 seconds later I see:
>>>>>>
>>>>>> Memory in use (percentage/used/max-heap): (\"94%\" \"3613M\"
>>>>>> \"3819M\")
>>>>>>
>>>>>> So I've run out of allowed memory. But why is that? I thought I gave
>>>>>> this app 7 gigs:
>>>>>>
>>>>>>   :jvm-opts ["-Xms7g" "-Xmx7g" "-XX:-UseCompressedOops"])
>>>>>>
>>>>>> As to possibility #2, I found this old post on the Clojure mailist:
>>>>>>
>>>>>> Andy Fingerhut wrote, "one thing I've found in the past on a 2-core
>>>>>> machine that was achieving much less than 2x speedup was memory bandwidth
>>>>>> being the limiting factor."
>>>>>>
>>>>>>
>>>>>> https://groups.google.com/forum/#!searchin/clojure/xmx$20xms$20maximum%7Csort:relevance/clojure/48W2eff3caU/HS6u547gtrAJ
>>>>>>
>>>>>> But I am not sure how to test this.
>>>>>>
>>>>>> As to possibility #3, I'm not sure how AWS enforces its IOPS limits.
>>>>>> If people think this is the most likely possibility, then I will repost 
>>>>>> my
>>>>>> question in an AWS forum.
>>>>>>
>>>>>> As to possibility #4, durable-queue is well-tested and used in a lot
>>>>>> of projects, and Zach Tellman is smart and makes few mistakes, so I'm
>>>>>> doubtful that it is to blame, but I do notice that it starts off with 4
>>>>>> active slabs and then after 120 seconds, it is only using 1 slab. Is that
>>>>>> expected? If people think this is the possible problem then I'll ask
>>>>>> somewhere specific to durable-queue
>>>>>>
>>>>>> Overall, my log information looks like this:
>>>>>>
>>>>>>     ("\nStats about from-mysql-to-tables-queue: " {"message"
>>>>>> {:num-slabs 3, :num-active-slabs 2, :enqueued 370137, :retried 0,
>>>>>> :completed 369934, :in-progress 10}})
>>>>>>
>>>>>>     ("\nResource usage: " "Memory in use (percentage/used/max-heap):
>>>>>> (\"66%\" \"2373M\" \"3568M\")\n\nCPU usage (how-many-cpu's/load-average):
>>>>>>  [4 5.05]\n\nFree memory in jvm: [1171310752]")
>>>>>>
>>>>>> 30 seconds later
>>>>>>
>>>>>>     ("\nStats about from-mysql-to-tables-queue: " {"message"
>>>>>> {:num-slabs 4, :num-active-slabs 4, :enqueued 608967, :retried 0,
>>>>>> :completed 608511, :in-progress 10}})
>>>>>>
>>>>>>     ("\nResource usage: " "Memory in use (percentage/used/max-heap):
>>>>>> (\"76%\" \"2752M\" \"3611M\")\n\nCPU usage (how-many-cpu's/load-average):
>>>>>>  [4 5.87]\n\nFree memory in jvm: [901122456]")
>>>>>>
>>>>>> 30 seconds later
>>>>>>
>>>>>>     ("\nStats about from-mysql-to-tables-queue: " {"message"
>>>>>> {:num-slabs 4, :num-active-slabs 3, :enqueued 828950, :retried 0,
>>>>>> :completed 828470, :in-progress 10}})
>>>>>>
>>>>>>     ("\nResource usage: " "Memory in use (percentage/used/max-heap):
>>>>>> (\"94%\" \"3613M\" \"3819M\")\n\nCPU usage (how-many-cpu's/load-average):
>>>>>>  [4 6.5]\n\nFree memory in jvm: [216459664]")
>>>>>>
>>>>>> 30 seconds later
>>>>>>
>>>>>>     ("\nStats about from-mysql-to-tables-queue: " {"message"
>>>>>> {:num-slabs 1, :num-active-slabs 1, :enqueued 1051974, :retried 0,
>>>>>> :completed 1051974, :in-progress 0}})
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Clojure" group.
>>>>>>
>>>>> To post to this group, send email to clo...@googlegroups.com
>>>>>
>>>>>
>>>>>> Note that posts from new members are moderated - please be patient
>>>>>> with your first post.
>>>>>> To unsubscribe from this group, send email to
>>>>>>
>>>>> clojure+u...@googlegroups.com
>>>>>
>>>>>
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/clojure?hl=en
>>>>>> ---
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Clojure" group.
>>>>>>
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>>> an email to clojure+u...@googlegroups.com.
>>>>>
>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>>
>>>> To post to this group, send email to clo...@googlegroups.com
>>>>
>>>>
>>>>> Note that posts from new members are moderated - please be patient
>>>>> with your first post.
>>>>> To unsubscribe from this group, send email to
>>>>>
>>>> clojure+u...@googlegroups.com
>>>>
>>>>
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/clojure?hl=en
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>>
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to clojure+u...@googlegroups.com.
>>>>
>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to clojure+u...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>>
> - sent from my mobile
>>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
-- 
- sent from my mobile

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: possibly a Clojure question or possibly an AWS question: slow writes to durable-queue

Reply via email to