Hi Eric, your dedication in getting Puppet faster is really appreciated. My post is absolutely not in favor of XPP, but please don't get me wrong: it is meant to be a constructive contribution to the current design process.
In my personal opinion we have a sad history of optimizations focusing a lot on blaming different languages and tools. Puppet often created fancy new tools with new languages and components, but we rarely tackled the root causes of our problems. This would be off topic, but I guess I'll add a few examples by the end of this mail to let you understand what I mean. So let me start with the stated "Problems": * Performance: I didn't do any measurements, but I guess the compiler spends more time in resolving dependencies and traversing graphs than it does in parsing and validating .pp files. Not to mention a lot of compat hacks, alias-handling voodoo, insane Hiera lookups, type validation for those lookups and legacy support hacks. So do you have any related numbers? Where is most of the time spent when building and shipping (real-world) catalogs? Are you really sure an AST-cache (per manifest?!) would be worth the effort and solve the "performance problem"? I guess the C++ parser itself is not so slow that it already needs an AST cache, because then there would be something wrong with it. * Cross-Language support: You wrote that the C++ parser needs to provide the compiled AST to the Ruby runtime. Makes sense to me. But parsing .pp files with C++, serializing them to a custom not yet designed format, parsing that custom format with Ruby again and then re-resolve all (most, some?) dependency graphs across the whole catalog with Ruby... this doesn't sound like something that could help with getting things faster. Sure, it would help the C++ parser to hand over it's AST. Or store it to disk. But would this speed up the whole process? I have some serious doubts in that relation. IMHO this wouldn't help much, at least not unless "drop all Ruby interfaces in the long run" is the final goal on your agenda. In that case please let us know. Those who want to support that goal could unite their forces to get it accomplished as fast as possible, the others would at least know what to expect. In a current Puppet ecosystem a C++ parser able to generate an AST from a .pp file to me still seems far from anything that could completely replace the current Ruby-based parser in a helpful way very soon. At least not in a real-world environment with lot's of modules, custom functions and external data sources, often provided by custom lookup functions. At least not in a way that would bring any benefit to the average Puppet user. So, to me the former one remains a key question to the performance benefit we could get from all this. As long as the Ruby runtime is supported, I do not really see how this could work out. But this is just a blind guess, please prove me wrong on this. Obviously the C++ Puppet will be faster as soon as you drop the Ruby runtime. But then we should add something else to the big picture: how should we build custom extensions and interfaces to custom data in the future? Forking plugins? Talking with web services? Because adding a C++ compiler to a (dev)ops deployment pipeline will not convince many people I guess. Everything that comes to my mind has it's very own performance impact. We should know what to expect in that direction to be able to understand what needs to be added to our (performance) calculation. As of this writing and from what I know from mailing lists, Puppet Conf (and Camps) to me the C++ parser is still an academic construct able to generate an AST in a fast way. Nice for sure, but not (yet) any benefit in a real-world Puppet scenario. Of course I might be missing some parts of your big picture, involving strategic product-related features not yet known to the public. But please do forget that the extensibility of a tool is one of the key features of any OpenSource software. Ops people didn't choose good old Nagios because of it's "beautiful" frontend and it's "well-designed" plugin API. They are using it because everyone from students to 60 years old UNIX veterans are able to write something they use to call a "plugin". Mostly awful snippets of Bash or Perl, not worth to be called software. But doing customized crazy shit running on millions of systems, available since nearly 20 years without breaking compatibility. Of course there is Icinga right now ;) New Core, C++, shiny new web... but still running those ugly old plugins. They are awful, they are terrible, we all hat them. But lots of people invested a lot of time in them, so breaking them is a no-go. No one I know currently understands how existing "interfaces" (being plain Ruby) fit in if your C++ plans. There is a lot of uncertainty amongst (skilled) Puppet users regarding that right now. Some public clarification would definitively help to smooth the waters. If your plans include dropping that part in favor of restricted EPP and DSL-based "functions" please let us know. It will be faster, for sure. But it will be a different product with different (restricted) possibilities. In that case I would prefer to be among the first ones leaving the ship instead of being treated like the famous slowly boiled frog. But let's get back to the next point in your proposal, "requirements": * publishing modules as XPP: I guess building an AST for a module would take less time than checking out the very same module with r10k from your local GIT repository. Even with "slow Ruby code". So IMO there are no real benefits for this, but lots of potential pitfalls, insecurities, bugs. If you need this to provide obfuscated Enterprise-only modules in the future... well, it's your choice. * longevity of file formats: what makes you think that Puppet will change slower in the near future? Today there is no way to run many Puppet 3.x Manifests with Puppet 4.x, and those are plain .pp files. An AST would per definition be a lot more fragile. Why should we believe that those cache files would survive longer? * Efficient serialization is key to the success of XPP: you name it. And please do not forget that efficient unserialization is far more important. This will not take zero time and happens as often as a .pp file is parsed today. "Non-goals": * If XPP will be plaintext it would obviously be not that fast, but that's still fine for me * I also have no problem with a serialized format not readably by human beings. I will happily live with any binary format as long as you keep YAML and similar diseases far away from me ;-) "Proposal": * XPP file handling in general sounds good to me * I have some doubts when it goes to checking whether that file is "up to date". Race conditions and issues when people are manually copying files come to my mind. * a safe way to solve this could be xpp files carrying source file checksums in their name, but of course that would then be more costly as it involves generating and validating checksums all the time. Outdated XPP files must be removed. * You know that people use r10k or custom tools to just checkout specific tags or commit IDs again and again? Sometimes directly in their module path. I work with customers where every 2-5 minutes the whole day long someone pushes a new Puppetfile in an automated way. How would that fit with your XPP model? Should Puppet (r10k, whoever) re-check/generate all of them with every deployment? Every few minutes? Also please to not underestimate the potential pitfalls for users when trusting file modification times. We could run into a support nightmare. We all know, writing a cache is not an easy task. "API, switches, protocols": * looks good to me "Interfaces modified or extended": * I see there is some discussion of whether XPP files should reside in the module directories or in a mirrored structure. Well, caught between a rock and a hard place - good luck :D "Diagnostics of XPP" * msgpack: well... mmmmhok * shebang: there are enough comments, nothing to add * pcore part, shebang line, mime type: you already define three different kinds of version/subformat headers in a draft for a new format. Not good. * mime type: a container for a bunch of different formats doesn't make a good format to me. Are you really sure that implementing AST serialization for C++ and Ruby (and others?) with different formats for all of those is a good idea? Msgpack AND JSON (easy) AND YAML (which version? * regarding YAML: how to protect against code injection? A slow Ruby-based parser, once again? * you mention JAR files as an example. They are used for deployment reasons, not for speed. XPP borrows some ideas from a JAR. A JAR is a container for convenience, it makes it easy to ship multiple files. However, it makes reading files slower, that's why they are being deployed (and extracted) on the target system. The resulting class file is what XPP should try to look like if it wants to bring any benefit. At least as long as you do not plan to store all pp files of a module in a single .xpp file - but that would be even slower for many use cases. And please note that class files are binary for a good reason: speed. * you mention pyc files. They are binary, contain marshalled code objects, once again being binary and native to Python. Same story as above. There IS a reason why they are fast. Doesn't fit our current XPP scenario with nested text formats. Next point, "Alternatives": * byte code oriented format: absolutely. If you want to have a fast AST cache, this would help. Still, please add the time eventually needed for eval'uating the (already parsed) AST with Ruby to the calculation. * wait until the C++ compiler is implemented: also a very good idea. And not only this, wait not only until it is implemented but also until we know where the whole ecosystem (Ruby functions, interfaces, Ruby-based "indirections") should move. Once you know how they will look like we will know better how to tune all this. Parsing and validating plain .pp files probably involves a fraction of the computing resources a Puppet master spends today. Ruby is far from being our main problem here. * embedding the C++ parser in Ruby would be a good and efficient approach. Good luck with Clojure and Jruby ;) * produce the .xpp also with Ruby: IMO a must. You will definitively run into compatibility issues between your different parsers. No easy way to discover them in an automated way without this feature. * fork the C++ parser: now it is getting scary. Sure, why not. But (un)serialization cost in one way or the other remains, doesn't it? "Additional Concerns": * "Compatibility": when you really allow different shebang lines, different serialization formats, XPP shipped with forge modules, auto-generated in your code deployment procedure, some people using other deployment mechanism, timestamp issues... all this together could result in a big mess. * "Security": you are right with "no extra impact", but I would add the possibility for new attack vectors eventually hidden to validation tools as soon as you add YAML (as mentioned in the draft) to the mix * "Documentation": I do not agree that this would not be necessary. XPP (when implemented) will be a key component of all deployments. People WILL build custom tools around it. It's better to state clearly how things are designed instead of letting everybody figure out by themselves how to do black magic. * Spin offs: wooo... this adds a lot of new players to the mix, while still being pretty vague. JavaScript? Then better stay with JSON ond forget about MsgPack. And how should a browser handle YAML? * C++ parser as a linter: makes sense to me * Encrypt XPP files: would not make them faster. While I'm an absolute fan of signed packages, I do not see a use for this on an XPP file level That was a lot of text, sorry :) And thank you for reading all this. My conclusion: the XPP file draft is an early optimization of something fitting in an ecosystem still very vaguely defined. If ever implemented, it should be postponed. I'm sure the C++ parser is a lot faster than the Ruby-based one. But hey, if I start up IRB, require 'puppet' (still 3.4 on my local Ubuntu desktop) - then it takes Puppet 0,03s to load and validate a random 5KB .pp file. This is not very fast, but I see no urgent problem with this. And as initially mentioned, this leads me to my last point - a few examples of similar findings and "optimizations" we enjoyed in the past: "Databases are slow" We had active-records hammering our databases. The conclusion wasn't that someone with SQL knowledge should design a good schema. The publicly stated reasoning was "well, databases are slow, so we need more cores to hammer the database, Ruby has not threading, Clojure is cool". It still was slow by the end, so we added a message queue, a dead letter office and more to the mix. Just to give you some related numbers to compare: a year or two ago I wrote a prototype for (fast) catalog-diffs. My test DB still carries 2400 random catalogs with an average of 1400 resources per catalog, 18+ million single resource parameters in total. Of course far less rows in the DB as of checksum-based "de-duplication". But this is "real" data. The largest catalog has nearly 19,000 resources, the smallest one 450. Once again, no fake data, real catalogs collected over time from real environments. Storing an average catalog (1400 resources, cached JSON is 0,5-1MB) takes as far as I remember less than half a second all the times. For most environments something similar should perfectly be doable to persist catalogs as soon as compiled. Even in a blocking mode with no queue and a directly attached database in plain Ruby. "Facter is slow" Reasoning: Ruby is slow, we need C++ for a faster Facter. But Facter itself never was the real problem. When loaded from Puppet you can neglect also it's loading time. The problem were a few silly and some more not-so-good single fact implementations. cFacter is mostly faster because those facts been rewritten when they were implemented in C++. Still, as long as we have custom facts cFacter still needs to fork Ruby. And there it looses the startup time it initially saved. I guess the Ruby-based Puppet requires 'facter' instead of forking it. I could be wrong here. Still, the optimization was completely useless. But as a result of all this as of today it is harder for people to quickly fix facts behaving wrong on their systems. Combined with a C++-based Agent cFacter still could make sense as Puppetlabs wants to support more platforms. And even this argument isn't really valid. I'm pretty sure there are far more platforms with Ruby support than ones with a supported Puppet AIO package. "Puppet-Master is slow" Once again, Ruby is slow we learned. We got Puppet Server. I've met (and helped) a lot of people that had severe issues with this stack. I'm still telling anyone to not migrate unless there is no immediate need for doing so. Most average admins are perfectly able to manage and scale a Ruby-based web application. To them, Puppet Server is a black box. Hard to manage, hard to scale. For many of them it's the only Java-based application server they are running, so no clue about JVM memory management, JMX and so on. And I still need to see the one Puppet Server that is running faster than a Puppet Master in the same environment. Preferably with equal resource consumption. Should I go on with PCP/PXP? I guess that's enough so far, I think you understood what I mean. With what I know until now, C++ Puppet and XPP would make perfect next candidates for this hall of "fame". But as mentioned above, I'd love to be proven wrong an all this. I'm neither a Ruby fanboy nor do I have objections against C++. All I'm interested in is running my beloved Puppet hassle-free in production, not wasting my time for caring about the platform itself. I'd prefer to dedicate it to lots of small ugly self-written modules breaking all of the latest best practices I can find on the web ;-) Cheers, Thomas Am 30.03.2016 um 18:24 schrieb Eric Sorenson: > Hi, I've just posted a new Puppet RFC that describes pre-parsed and > pre-validated Puppet files, akin to '.pyc' files for Python. It's called > XPP and the doc is open for comments here: > > https://docs.google.com/document/d/17SFn_2PJYcO5HjgA4R65a5ynR6_bng_Ak5W53KjM4F8/edit?usp=sharing > > Please comment inline on the doc, or come back to this thread if the > conversation gets too involved (more than about 4-5 replies in a comment > box on google docs becomes unwieldy) > > Once the commenting tapers off we'll incorporate changes into the spec > and post it as markdown in the puppet-rfc repo: > https://github.com/puppetlabs/puppet-rfc > > --eric0 -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/ndkksc%24gjg%241%40ger.gmane.org. For more options, visit https://groups.google.com/d/optout.