Daniel Compton, good suggestion. I've increased the memory to see if I can 
postpone the GCs, and I'll log that more carefully. 


On Wednesday, October 11, 2017 at 8:35:44 PM UTC-4, Daniel Compton wrote:
>
> Without more information it's hard to tell, but this looks a like it could 
> be a garbage collection issue. Can you run your test again and add some 
> logging/monitoring to show each garbage collection? If my hunch is right, 
> you'll see garbage collections getting more and more frequent until they 
> take up nearly all the CPU time, preventing much forward progress writing 
> to the queue.
>
> If it's AWS based throttling, then CloudWatch monitoring 
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-volume-status.html#using_cloudwatch_ebs
>  might 
> show you some hints. You could also test with an NVMe drive attached, just 
> to see if disk bandwidth is the issue.
>
> On Thu, Oct 12, 2017 at 11:58 AM Justin Smith <noise...@gmail.com 
> <javascript:>> wrote:
>
>> a small thing here, if memory usage is important you should be building 
>> and running an uberjar instead of using lein on the server (this also has 
>> other benefits), and if you are doing that your project.clj jvm-opts are 
>> not used, you have to configure your java command line in aws instead
>>
>> On Wed, Oct 11, 2017 at 3:52 PM <lawrence...@gmail.com <javascript:>> 
>> wrote:
>>
>>> I can't figure out if this is a Clojure question or an AWS question. And 
>>> if it is a Clojure question, I can't figure out if it is more of a general 
>>> JVM question, or if it is specific to some library such as durable-queue. I 
>>> can redirect my question elsewhere, if people think this is an AWS 
>>> question. 
>>>
>>> In my project.clj, I try to give my app a lot of memory:
>>>
>>>   :jvm-opts ["-Xms7g" "-Xmx7g" "-XX:-UseCompressedOops"])
>>>
>>> And the app starts off pulling data from MySQL and writing it to 
>>> Durable-Queue at a rapid rate. ( 
>>> https://github.com/Factual/durable-queue )
>>>
>>> I have some logging set up to report every 30 seconds.
>>>
>>> :enqueued 370137,
>>>
>>> 30 seconds later:
>>>
>>> :enqueued 608967,
>>>
>>> 30 seconds later: 
>>>
>>> :enqueued 828950,
>>>
>>> It's a dramatic slowdown. The app is initially writing to the queue at 
>>> faster than 10,000 documents a second, but it slows steadily, and after 10 
>>> minutes it writes less than 1,000 documents per second. Since I have to 
>>> write a few million documents, 10,000 a second is the slowest speed I can 
>>> live with. 
>>>
>>> The queues are in the /tmp folder of an AWS instance that has plenty of 
>>> disk space, 4 CPUs, and 16 gigs of RAM. 
>>>
>>> Why does the app slow down so much? I had 4 thoughts:
>>>
>>> 1.) the app struggles as it hits a memory limit
>>>
>>> 2.) memory bandwidth is the problem
>>>
>>> 3.) AWS is enforcing some weird IOPS limit
>>>
>>> 4.) durable-queue is misbehaving
>>>
>>> As to possibility #1, I notice the app starts like this:
>>>
>>> Memory in use (percentage/used/max-heap): (\"66%\" \"2373M\" \"3568M\")
>>>
>>> but 60 seconds later I see: 
>>>
>>> Memory in use (percentage/used/max-heap): (\"94%\" \"3613M\" \"3819M\")
>>>
>>> So I've run out of allowed memory. But why is that? I thought I gave 
>>> this app 7 gigs: 
>>>
>>>   :jvm-opts ["-Xms7g" "-Xmx7g" "-XX:-UseCompressedOops"])
>>>
>>> As to possibility #2, I found this old post on the Clojure mailist:
>>>
>>> Andy Fingerhut wrote, "one thing I've found in the past on a 2-core 
>>> machine that was achieving much less than 2x speedup was memory bandwidth 
>>> being the limiting factor."
>>>
>>>
>>> https://groups.google.com/forum/#!searchin/clojure/xmx$20xms$20maximum%7Csort:relevance/clojure/48W2eff3caU/HS6u547gtrAJ
>>>
>>> But I am not sure how to test this. 
>>>
>>> As to possibility #3, I'm not sure how AWS enforces its IOPS limits. If 
>>> people think this is the most likely possibility, then I will repost my 
>>> question in an AWS forum. 
>>>
>>> As to possibility #4, durable-queue is well-tested and used in a lot of 
>>> projects, and Zach Tellman is smart and makes few mistakes, so I'm doubtful 
>>> that it is to blame, but I do notice that it starts off with 4 active slabs 
>>> and then after 120 seconds, it is only using 1 slab. Is that expected? If 
>>> people think this is the possible problem then I'll ask somewhere specific 
>>> to durable-queue
>>>
>>> Overall, my log information looks like this: 
>>>
>>>     ("\nStats about from-mysql-to-tables-queue: " {"message" {:num-slabs 
>>> 3, :num-active-slabs 2, :enqueued 370137, :retried 0, :completed 369934, 
>>> :in-progress 10}})
>>>
>>>     ("\nResource usage: " "Memory in use (percentage/used/max-heap): 
>>> (\"66%\" \"2373M\" \"3568M\")\n\nCPU usage (how-many-cpu's/load-average): 
>>>  [4 5.05]\n\nFree memory in jvm: [1171310752]")
>>>
>>> 30 seconds later
>>>
>>>     ("\nStats about from-mysql-to-tables-queue: " {"message" {:num-slabs 
>>> 4, :num-active-slabs 4, :enqueued 608967, :retried 0, :completed 608511, 
>>> :in-progress 10}})
>>>     
>>>     ("\nResource usage: " "Memory in use (percentage/used/max-heap): 
>>> (\"76%\" \"2752M\" \"3611M\")\n\nCPU usage (how-many-cpu's/load-average): 
>>>  [4 5.87]\n\nFree memory in jvm: [901122456]")
>>>
>>> 30 seconds later
>>>     
>>>     ("\nStats about from-mysql-to-tables-queue: " {"message" {:num-slabs 
>>> 4, :num-active-slabs 3, :enqueued 828950, :retried 0, :completed 828470, 
>>> :in-progress 10}})
>>>     
>>>     ("\nResource usage: " "Memory in use (percentage/used/max-heap): 
>>> (\"94%\" \"3613M\" \"3819M\")\n\nCPU usage (how-many-cpu's/load-average): 
>>>  [4 6.5]\n\nFree memory in jvm: [216459664]")
>>>     
>>> 30 seconds later
>>>
>>>     ("\nStats about from-mysql-to-tables-queue: " {"message" {:num-slabs 
>>> 1, :num-active-slabs 1, :enqueued 1051974, :retried 0, :completed 1051974, 
>>> :in-progress 0}})
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com 
>>> <javascript:>
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com <javascript:>
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to clojure+u...@googlegroups.com <javascript:>.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clo...@googlegroups.com 
>> <javascript:>
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+u...@googlegroups.com <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to clojure+u...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to