It sounds like you have a memory leak. I would look at addressing that before any performance tricks. On Fri, 13 Oct 2017 at 05:35, <lawrence.krub...@gmail.com> wrote:
> Following Daniel Compton's suggestion, I turned on logging for GC. I don't > see it happening more often, but the slow down does seem related to the > moment when the app hits the maximum memory allowed. It had been running > with 4G, so I increased that to 7G, so it goes longer now before it hits > 98% memory usage, but it does hit it eventually and then everything crawls > to a very slow speed. Not sure how much memory I would have to use to avoid > using up almost all of the memory. I suppose I'll figure that out via trial > and error. Until I can figure that out, nearly all other performance tricks > seems a bit besides the point. > > > > On Thursday, October 12, 2017 at 9:01:23 PM UTC-4, Nathan Fisher wrote: > >> Hi! >> >> Can you change one of the variables? Specifically can you replicate this >> on your local machine? If it happens locally then I would focus on >> something in the JVM eco-system. >> >> If you can't replicate it locally then it's possibly AWS specific. It >> sounds like you're using a t2.large or m4.xlarge. If it's the prior you may >> very well be contending between with your network bandwidth. EC2's host >> drive (EBS) is a networked drive which is split between your standard >> network traffic and the drive volume. If that's the issue then you might >> need to look at provisioned IOPs. A quick(ish) way to test that hypothesis >> is to provision a host with high networking performance and provisioned >> IOPs. >> >> Cheers, >> Nathan >> >> On Fri, 13 Oct 2017 at 00:05 <lawrence...@gmail.com> wrote: >> >>> Daniel Compton, good suggestion. I've increased the memory to see if I >>> can postpone the GCs, and I'll log that more carefully. >>> >>> >>> On Wednesday, October 11, 2017 at 8:35:44 PM UTC-4, Daniel Compton wrote: >>> >>>> Without more information it's hard to tell, but this looks a like it >>>> could be a garbage collection issue. Can you run your test again and add >>>> some logging/monitoring to show each garbage collection? If my hunch is >>>> right, you'll see garbage collections getting more and more frequent until >>>> they take up nearly all the CPU time, preventing much forward progress >>>> writing to the queue. >>>> >>>> If it's AWS based throttling, then CloudWatch monitoring >>>> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-volume-status.html#using_cloudwatch_ebs >>>> might >>>> show you some hints. You could also test with an NVMe drive attached, just >>>> to see if disk bandwidth is the issue. >>>> >>>> On Thu, Oct 12, 2017 at 11:58 AM Justin Smith <noise...@gmail.com> >>>> wrote: >>>> >>> a small thing here, if memory usage is important you should be building >>>>> and running an uberjar instead of using lein on the server (this also has >>>>> other benefits), and if you are doing that your project.clj jvm-opts are >>>>> not used, you have to configure your java command line in aws instead >>>>> >>>>> On Wed, Oct 11, 2017 at 3:52 PM <lawrence...@gmail.com> wrote: >>>>> >>>> I can't figure out if this is a Clojure question or an AWS question. >>>>>> And if it is a Clojure question, I can't figure out if it is more of a >>>>>> general JVM question, or if it is specific to some library such as >>>>>> durable-queue. I can redirect my question elsewhere, if people think this >>>>>> is an AWS question. >>>>>> >>>>>> In my project.clj, I try to give my app a lot of memory: >>>>>> >>>>>> :jvm-opts ["-Xms7g" "-Xmx7g" "-XX:-UseCompressedOops"]) >>>>>> >>>>>> And the app starts off pulling data from MySQL and writing it to >>>>>> Durable-Queue at a rapid rate. ( >>>>>> https://github.com/Factual/durable-queue ) >>>>>> >>>>>> I have some logging set up to report every 30 seconds. >>>>>> >>>>>> :enqueued 370137, >>>>>> >>>>>> 30 seconds later: >>>>>> >>>>>> :enqueued 608967, >>>>>> >>>>>> 30 seconds later: >>>>>> >>>>>> :enqueued 828950, >>>>>> >>>>>> It's a dramatic slowdown. The app is initially writing to the queue >>>>>> at faster than 10,000 documents a second, but it slows steadily, and >>>>>> after >>>>>> 10 minutes it writes less than 1,000 documents per second. Since I have >>>>>> to >>>>>> write a few million documents, 10,000 a second is the slowest speed I can >>>>>> live with. >>>>>> >>>>>> The queues are in the /tmp folder of an AWS instance that has plenty >>>>>> of disk space, 4 CPUs, and 16 gigs of RAM. >>>>>> >>>>>> Why does the app slow down so much? I had 4 thoughts: >>>>>> >>>>>> 1.) the app struggles as it hits a memory limit >>>>>> >>>>>> 2.) memory bandwidth is the problem >>>>>> >>>>>> 3.) AWS is enforcing some weird IOPS limit >>>>>> >>>>>> 4.) durable-queue is misbehaving >>>>>> >>>>>> As to possibility #1, I notice the app starts like this: >>>>>> >>>>>> Memory in use (percentage/used/max-heap): (\"66%\" \"2373M\" >>>>>> \"3568M\") >>>>>> >>>>>> but 60 seconds later I see: >>>>>> >>>>>> Memory in use (percentage/used/max-heap): (\"94%\" \"3613M\" >>>>>> \"3819M\") >>>>>> >>>>>> So I've run out of allowed memory. But why is that? I thought I gave >>>>>> this app 7 gigs: >>>>>> >>>>>> :jvm-opts ["-Xms7g" "-Xmx7g" "-XX:-UseCompressedOops"]) >>>>>> >>>>>> As to possibility #2, I found this old post on the Clojure mailist: >>>>>> >>>>>> Andy Fingerhut wrote, "one thing I've found in the past on a 2-core >>>>>> machine that was achieving much less than 2x speedup was memory bandwidth >>>>>> being the limiting factor." >>>>>> >>>>>> >>>>>> https://groups.google.com/forum/#!searchin/clojure/xmx$20xms$20maximum%7Csort:relevance/clojure/48W2eff3caU/HS6u547gtrAJ >>>>>> >>>>>> But I am not sure how to test this. >>>>>> >>>>>> As to possibility #3, I'm not sure how AWS enforces its IOPS limits. >>>>>> If people think this is the most likely possibility, then I will repost >>>>>> my >>>>>> question in an AWS forum. >>>>>> >>>>>> As to possibility #4, durable-queue is well-tested and used in a lot >>>>>> of projects, and Zach Tellman is smart and makes few mistakes, so I'm >>>>>> doubtful that it is to blame, but I do notice that it starts off with 4 >>>>>> active slabs and then after 120 seconds, it is only using 1 slab. Is that >>>>>> expected? If people think this is the possible problem then I'll ask >>>>>> somewhere specific to durable-queue >>>>>> >>>>>> Overall, my log information looks like this: >>>>>> >>>>>> ("\nStats about from-mysql-to-tables-queue: " {"message" >>>>>> {:num-slabs 3, :num-active-slabs 2, :enqueued 370137, :retried 0, >>>>>> :completed 369934, :in-progress 10}}) >>>>>> >>>>>> ("\nResource usage: " "Memory in use (percentage/used/max-heap): >>>>>> (\"66%\" \"2373M\" \"3568M\")\n\nCPU usage (how-many-cpu's/load-average): >>>>>> [4 5.05]\n\nFree memory in jvm: [1171310752]") >>>>>> >>>>>> 30 seconds later >>>>>> >>>>>> ("\nStats about from-mysql-to-tables-queue: " {"message" >>>>>> {:num-slabs 4, :num-active-slabs 4, :enqueued 608967, :retried 0, >>>>>> :completed 608511, :in-progress 10}}) >>>>>> >>>>>> ("\nResource usage: " "Memory in use (percentage/used/max-heap): >>>>>> (\"76%\" \"2752M\" \"3611M\")\n\nCPU usage (how-many-cpu's/load-average): >>>>>> [4 5.87]\n\nFree memory in jvm: [901122456]") >>>>>> >>>>>> 30 seconds later >>>>>> >>>>>> ("\nStats about from-mysql-to-tables-queue: " {"message" >>>>>> {:num-slabs 4, :num-active-slabs 3, :enqueued 828950, :retried 0, >>>>>> :completed 828470, :in-progress 10}}) >>>>>> >>>>>> ("\nResource usage: " "Memory in use (percentage/used/max-heap): >>>>>> (\"94%\" \"3613M\" \"3819M\")\n\nCPU usage (how-many-cpu's/load-average): >>>>>> [4 6.5]\n\nFree memory in jvm: [216459664]") >>>>>> >>>>>> 30 seconds later >>>>>> >>>>>> ("\nStats about from-mysql-to-tables-queue: " {"message" >>>>>> {:num-slabs 1, :num-active-slabs 1, :enqueued 1051974, :retried 0, >>>>>> :completed 1051974, :in-progress 0}}) >>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Clojure" group. >>>>>> >>>>> To post to this group, send email to clo...@googlegroups.com >>>>> >>>>> >>>>>> Note that posts from new members are moderated - please be patient >>>>>> with your first post. >>>>>> To unsubscribe from this group, send email to >>>>>> >>>>> clojure+u...@googlegroups.com >>>>> >>>>> >>>>>> For more options, visit this group at >>>>>> http://groups.google.com/group/clojure?hl=en >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Clojure" group. >>>>>> >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>> an email to clojure+u...@googlegroups.com. >>>>> >>>>> >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Clojure" group. >>>>> >>>> To post to this group, send email to clo...@googlegroups.com >>>> >>>> >>>>> Note that posts from new members are moderated - please be patient >>>>> with your first post. >>>>> To unsubscribe from this group, send email to >>>>> >>>> clojure+u...@googlegroups.com >>>> >>>> >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/clojure?hl=en >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Clojure" group. >>>>> >>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to clojure+u...@googlegroups.com. >>>> >>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clo...@googlegroups.com >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+u...@googlegroups.com >>> For more options, visit this group at >>> http://groups.google.com/group/clojure?hl=en >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to clojure+u...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- >> > - sent from my mobile >> > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- - sent from my mobile -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.