Re: [Puppet Users] Re: Puppetserver performance plummeting a few hours after startup

Martijn Grendelman Wed, 12 Feb 2020 01:28:06 -0800

Hi all,

It seems that in fact, setting ReservedCodeCacheSize has had a positive
impact on the performance. Since monday, catalog compilation times have
been steady at around 10 seconds average. Still not as good as the
secondary server, but acceptable!


For the record, we are not managing metaspace size or
max-requests-per-instance.

I'll try to lower the numer of workers and see if that has a positive
effect on memory / cpu usage while keeping the compilation times down.

Thanks again for all your suggestions!!

Best regards,
Martijn.



Op 10-2-2020 om 19:02 schreef Justin Stoller:
>
>
> On Mon, Feb 10, 2020 at 1:44 AM Martijn Grendelman
> <mart...@grendelman.net <mailto:mart...@grendelman.net>> wrote:
>
>     Hi Kevin and others who have responded,
>
>     Thanks all for your tips. Unfortunately, no breakthroughs yet.
>
>     The current state is this:
>
>       * Both Puppetservers typically run at the latest version,
>         currently both 6.8.0.
>       * The primary server has 8 virtual cores and 12 GB of physical
>         (virtualized) RAM, Java is running with -Xms6g -Xmx6g.
>       * Max-active-instances is currently set to 7.
>       * This morning, I added -XX:ReservedCodeCacheSize=1g to the JVM
>         startup config.
>       * The size of our 'environments' directory is 131 MB. We
>         currently have 3 environments.
>
>     I've been looking at JVM stats with 'jstat', and the server
>     doesn't appear to spend any significant amount of time doing GC
>     (seems to be about 1%).
>
>
> fwiw, gceasy.io <http://gceasy.io> and their family of jvm analysis
> reports can be helpful if you have the gc logs, etc available. You
> might want to see if you're managing Metaspace (like
> -XX:MaxMetaspace=1G). Mostlikely not, and if you were having issues I
> think it'd cause full GCs so its not likely a problem, but its a thing
> to check. It should, if you have to manage it, have a similar value to
> CodeCache.
>  
>
>
>     After a server restart, compilation times typically drop to 9
>     seconds on average (on the secondary server, it's 5 seconds
>     consistently), but after a while, they go back to 30 or 40 seconds. 
>
>
>     As I noted in my first post, our server has an average  of less
>     than 2 concurrent agents talking to it, so I can't imagine this
>     happening due to lack of resources. The fact that our secondary
>     server handles a bigger load than the primary, with a third of the
>     memory and only 2 cores, seems to confirm this.
>
>
> One thing that gets folks is that each worker instance is pretty
> heavy-weight (heap, non-heap, and cpu, even when relatively idle). If
> you only need 2 or 3 instances, you should try lowering your max
> active instances to that number and see what happens.
>
>
>     So:
>     - enough CPU power (I would think)
>     - enough memory
>     - no significant garbage collection
>     - Puppetserver causing a load of 5
>
>
> I also asked about max-requests-per-instance, ideally it should be 0
> (ie off) or some very high number (like 1000000).
>
>
>     Any more tips? Would it make sense to run PuppetDB and PostgreSQL
>     on a different VM?
>
>
> If that's the biggest difference you might want to go in that
> direction. I've seen PSQL tuned to where it's different child
> processes would consume way more memory than intended. You'd probably
> want to confirm that with top, et al.
>
>
>     Thanks,
>     Martijn Grendelman.
>
>
>
>
>
>
>
>
>
>
>     Op 6-2-2020 om 17:43 schreef KevinR:
>>     Hi Martijn,
>>
>>     it sounds like you have a sub-optimal combination of:
>>
>>       * The amount of JRubies
>>       * The total amount of java heap memory for puppetserver
>>       * The size of your code base
>>
>>     This typically causes the kind of problems you're experiencing.
>>     What's happening in a nutshell is that puppet is loading so much
>>     code in memory that is starts running out of it and starts
>>     performing garbage collection more and more aggressively. At the
>>     end, 95% of all cpu cycles are spent on garbage collection and
>>     you don't have any cpu cycles left over to actually do work like
>>     compile catalogs...
>>
>>     To understand how Puppet loads code into memory:
>>
>>     Your code base is:  ( [ size of your control-repo ] + [ size of
>>     all the modules from the Puppetfile ] )  x  [ the amount of
>>     puppet code environments]
>>     So let's say:
>>
>>       * your control repo is 5MB in size
>>       * all modules together are 95MB in size
>>       * you have 4 code environments: development, testing,
>>         acceptance and production
>>
>>     That's 100MB of code to load in memory, per environment. For 4
>>     environments, that's 400MB.
>>     A different way to get this amount directly is to run *du -h
>>     /etc/puppetlabs/code/environments* on the puppet master and look
>>     at the size reported for */etc/puppetlabs/code/environments*
>>
>>     Now every JRuby will load that entire code base into memory. So
>>     if you have 4 JRubies, that's 1600MB of java heap memory that's
>>     actually needed. You can imagine what problems will happen if
>>     there isn't this much heap memory configured...
>>
>>     If you're using the defaults, Puppet will create the same amount
>>     of JRubies as the number of cpu cores on your master, minus 1,
>>     with a maximum of 4 JRubies for the system.
>>     If you override the defaults, you can specify any number of
>>     JRubies you want with the max-active-instances setting.
>>
>>     So by default a 2-cpu puppet master will create 1 JRuby, a 4-cpu
>>     puppet master will create 3 JRubies, an 8-cpu puppet master will
>>     create 4 JRubies.
>>
>>     So now you know how to determine the amount of java heap memory
>>     you need to configure, which you can do by configuring the -Xmx
>>     and -Xms options in the JAVA_ARGS section of the puppetserver
>>     startup command.
>>     Then finally make sure the host has enough physical memory
>>     available to provide this increased amount of java heap memory.
>>
>>     Once enough java heap memory is provided, you'll see the cpu
>>     usage stay stable.
>>
>>     Kind regards,
>>
>>     Kevin Reeuwijk
>>
>>     Principal Sales Engineer @ Puppet
>>
>>
>>     On Thursday, February 6, 2020 at 11:51:42 AM UTC+1, Martijn
>>     Grendelman wrote:
>>
>>         Hi,
>>
>>         A question about Puppetserver performance.
>>
>>         For quite a while now, our primary Puppet server is suffering
>>         from severe slowness and high CPU usage. We have tried to
>>         tweak its settings, giving it more memory (Xmx = 6 GB at the
>>         moment) and toying with the 'max-active-instances' setting to
>>         no avail. The server has 8 virtual cores and 12 GB memory in
>>         total, to run Pupperserver, PuppetDB and PostgreSQL.
>>
>>         Notably, after a restart, the performance is acceptable for a
>>         while (several hours, up to a almost day), but then it
>>         plummets again.
>>
>>         We figured that the server was just unable to cope with the
>>         load (we had over 270 nodes talking to it in 30 min
>>         intervals), so we added a second master that now takes more
>>         than half of that load (150 nodes). That did not make any
>>         difference at all for the primary server. The secondary
>>         server however, has no trouble at all dealing with the load
>>         we gave it.
>>
>>         In the graph below, that displays catalog compilation times
>>         for both servers, you can see the new master in green. It has
>>         very constant high performance. The old master is in yellow.
>>         After a restart, the compile times are good (not great) for a
>>         while.The first dip represents ca. 4 hours, the second dip
>>         was 18 hours. At some point, the catalog compilation times
>>         sky-rocket, as does the server load. 10 seconds in the graph
>>         below corresponds to a server load of around 2, while 40
>>         seconds corresponds to a server load of around 5. It's the
>>         Puppetserver process using the CPU.
>>
>>         The second server, the green line, has a consistent server
>>         load of around 1, with 4 GB memory (2 GB for the Puppetserver
>>         JVM) and 2 cores (it's an EC2 t3.medium).
>>
>>
>>
>>         If I have 110 nodes, doing two runs per hour, that each take
>>         30 seconds to run, I would still have a concurrency of less
>>         than 2, so Puppet causing a consistent load of 5 seems
>>         strange. My first thought would be that it's garbage
>>         collection or something like that, but the server plenty of
>>         memory (OS cache has 2GB).
>>
>>         Any ideas on what makes the Puppetserver starting using so
>>         much CPU? What can we try to keep it down?
>>
>>         Thanks,
>>         Martijn Grendelman
>>
>>     -- 
>>     You received this message because you are subscribed to a topic
>>     in the Google Groups "Puppet Users" group.
>>     To unsubscribe from this topic, visit
>>     https://groups.google.com/d/topic/puppet-users/sYARGwznkOs/unsubscribe.
>>     To unsubscribe from this group and all its topics, send an email
>>     to puppet-users+unsubscr...@googlegroups.com
>>     <mailto:puppet-users+unsubscr...@googlegroups.com>.
>>     To view this discussion on the web visit
>>     
>> https://groups.google.com/d/msgid/puppet-users/e8cff298-a0a6-48a9-9bbc-a7f000926467%40googlegroups.com
>>     
>> <https://groups.google.com/d/msgid/puppet-users/e8cff298-a0a6-48a9-9bbc-a7f000926467%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
>     -- 
>     You received this message because you are subscribed to the Google
>     Groups "Puppet Users" group.
>     To unsubscribe from this group and stop receiving emails from it,
>     send an email to puppet-users+unsubscr...@googlegroups.com
>     <mailto:puppet-users+unsubscr...@googlegroups.com>.
>     To view this discussion on the web visit
>     
> https://groups.google.com/d/msgid/puppet-users/d5fc4c74-e9c0-c5da-88a2-07784767b6a2%40grendelman.net
>     
> <https://groups.google.com/d/msgid/puppet-users/d5fc4c74-e9c0-c5da-88a2-07784767b6a2%40grendelman.net?utm_medium=email&utm_source=footer>.
>
> -- 
> You received this message because you are subscribed to a topic in the
> Google Groups "Puppet Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/puppet-users/sYARGwznkOs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> puppet-users+unsubscr...@googlegroups.com
> <mailto:puppet-users+unsubscr...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-users/CA%2B%3DBEqVp5tAvAUnFP3P-t3ZCLNZ-qteyiq1SvNnhpzBSLm5AoA%40mail.gmail.com
> <https://groups.google.com/d/msgid/puppet-users/CA%2B%3DBEqVp5tAvAUnFP3P-t3ZCLNZ-qteyiq1SvNnhpzBSLm5AoA%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/49d5fc9e-5e87-c9ff-f788-2bec986a5213%40grendelman.net.

Re: [Puppet Users] Re: Puppetserver performance plummeting a few hours after startup

Reply via email to