Hello people, as the discussion about glep55 has gone in circles long enough I decided to collect the various ideas presented around that theme complex and compare them in a way that might allow us to reach a sane decision.
Since it has become quite a lot of text I've kept some parts in sentence fragments and bullet points. If anyone feels the need to change that into more verbose wording feel free to do so, I hope the idea is clear enough. I feel it is still a draft and could use some massaging. If I should have forgotten any approach or misrepresented one I'd appreciate an updated or rephrased section so it can be easily updated. For anyone not interested in reading the whole thing, the conclusion is that we want to have the eapi in an easily parsed form in the ebuilds. The versioning rule change discussion (mostly glep54) should happen in an independent discussion. hth, Patrick
GLEP: xxx Title: Ebuild format and metadata handling Version: $Revision: 1.0 $ Last-Modified: $Date $ Author: Patrick Lauer <patr...@gentoo.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Obsoletes: GLEP55 Created: 31-May-2009 Post-History: 31-May-2009 Problem statement ================= As ebuild formats evolve there are potentially disruptive changes that are technically easy to implement, but may break backwards compatibility. To mitigate these issues in the future multiple proposals have been brought forward. Their common goal is to reduce the negative impact of changes on users, especially in terms of upgrade paths and error reporting. The issues mentioned in GLEP55 are eapi discovery and backwards-incompatible structural changes (global scope functions, per-package eclasses etc) A completely independent issue is the change in versioning rules. It is debatable if such a change is even wanted or needed and as such should be discussed in an independent GLEP. Eapi discovery - proposals ========================== There are currently at least four "big" proposals with various small variations being discussed: "haubi" For lack of a better name we have labeled this one after the person who brought it up the last few times, haubi. He proposes to use an eapi.eclass and define eapi as a function. If the eclass discovers an older package manager it aborts cleanly with a nice error message. To quote from the initial email: """ To fulfill this requirement, and to make it easy for the PM to query the EAPI without sourcing, we could specify to have the EAPI definition be the first non-comment line, and to look like: inherit eapi 4 Now when the PM is capable of pre-source EAPI detection, it will set EAPI before sourcing, eapi.eclass can see EAPI already being set and not do the 'exit' in global scope. Or even the PM's inherit-implementation expects to be first called with arguments "eapi 4", and not reading the eapi.eclass at all, so the 'eapi.eclass' does not need to check for anything, just needs to 'exit' when inherited. After that 'inherit eapi X' line, we can specify EAPI X to do whatever we want. It even does not need to be bash-sourceable. Yes, it is a compromise, but it looks acceptable to me. """ nihilist proposal: Keep things as they are. Things have worked acceptably well in the past. The need to change things is overstated and assumes that the current state is broken. New EAPIs will be reasonably close to current EAPIs so that compatibility (forward as well as backwards) can be sustained without losing too many potential features. Any disruptive changes can either be kept out of the main tree or be implemented by shifting the main tree to a different location so that old package managers won't see the incompatibilities (elaborated in detail further below) parsers: The current practise of putting the eapi definition near the top of the ebuild, combined with the need to state it for all non-EAPI0 ebuilds, suggests that it can be parsed without having to source the ebuild. It enforces some minor limitations, for example EAPI needs to be unique and cannot be overridden by eclasses. These limitations are only enforcing current behaviour and make QA easier. Specific suggestion: """ The EAPI value shall be the righthand side of the first expression starting with a string matching "^EAPI=". """ This definition does not allow multiple redefinitions of the eapi value and ignores comments and malformed lines. It also disallows setting eapi in eclasses or through other indirect methods. glep55: See GLEP55. To summarize: The eapi is put into the file name so that the package manager knows the EAPI (and thus how to handle this file format). While it simplifies the eapi discovery this comes at a high price as there is no reliable way to find and validate all ebuilds. Some people also see it as bad design as it exposes file internals in the filename. EAPI discovery - Discussion =========================== nihilist: + no compatibility problems, things stay as they are + smallest impact - nothing changes (+-) prevents some form of disruptive changes but do we need them? - doesn't fix some of the perceived issues haubi: + simple and clean solution + easy to extend (- needs package manager support to expose which eapis are supported) - format change, meaning of inherit changes parsers: + small impact, only codifies current practise + good backwards compatibility (-) needs some support tools written (+-) enforces some restriction on the possible changes in future EAPIs glep55: + allows to change everything (file format, versioning rules) - allows to change everything (makes QA impossible, allows adding non-ebuild formats, makes version sorting potentially impossible) - cannot be reversed in the near future if for any reason we decide eapi-in-filename is a bad idea we're stuck with it unless we are willing to break backwards compatibility in the same way glep55 tries to avoid - has not been accepted after over a year of discussion - exposes extra metadata in the filename this has been considered to be bad design by many EAPI discovery - Performance ============================ Mostly irrelevant anyway - on a reasonably fast machine with ~1200 packages: emerge -upNDv world with hot cache: ~10sec with cold cache: ~75 seconds without cache: ~15 minutes Sourcing ebuilds is exquisitely slow, the biggest slowdowns are IO because of (i) lack of metadata cache (needs sourcing the ebuild) and (ii) inefficient metadata cache (one file per ebuild is easy to work with, but inherently inefficient) nihilist: known "bad" performance. It's slow, but we've come to accept it somehow. haubi: slightly better performance on early abort, still needs to partially source the ebuild and start a full bash. The expected speedup is negligible. parsers: one open() per file. Only IO-heavy, parsing is cheap. glep55: one stat() per file. Saves one open() compared to the parsers, but that open happens later when that file is sourced in the case of a valid ebuild anyway. performance discussion - caching -------------------------------- - best case: Full valid cache. Moderately fast. Changing the metadata cache format to improve performance is possible. All 4 proposals have no impact on the base performance. GLEP55 has the potential so save a few metadata cache open()s because the eapi can be determined from the filename. - worst case: No metadata cache. Very slow. GLEP55 saves time on EAPI discovery, but the relative impact of sourcing ebuilds later must be taken into account. Parser is slower than glep55 as it needs to open the file. - average case: ??? We lack information to decide that. Metadata extraction: GLEP55 saves time on eapi extraction, but metadata extraction is still slow. Rough calculation: disk seek is 10ms. 10 ebuilds versions available. worst case: last ebuild only readable by package manager g55: 10ms readdir to find versions 10ms seek for ebuild content 100ms sourcing x ms eclass seek --> 120ms+ nihil: 10ms readdir to find versions 10x10ms seek for ebuild content no extra seek for sourcing 10x100ms sourcing --> 1110ms+ haubi: same as nihil plus one eclass read (eapi eclass) --> 1120ms+ parser: 10ms readdir, 10x10ms seek for ebuild content 100ms source for one ebuild x ms eclasses --> 210ms+ best case: first one g55: readdir, seek, source -> 120ms+ nihil: readdir, seek, source -> 120ms+ haubi: readdir, seek, source, one eclass -> 130ms+ parser: readdir, seek, source -> 120ms+ This ignores visibility checks (package.mask etc.), these should have a constant overhead that does not change the basic figures much. performance discussion - conclusion ----------------------------------- speedup of 5-10x for the worst case with glep55 and parser. Takes time from 15 minutes to 90-180 seconds for such a scenario. Best case (valid complete cache) no performance difference. Average case depends on too many assumptions. If performance is considered important the parser proposal would be the preferred non-glep55 solution. The decision then is limited to parser or g55. Conclusion ========== There are multiple options how the ebuild format can evolve in the future. The easiest path would be to collect ideas and not to change anything now (the nihilist approach). In terms of backwards compatibility and error reporting the haubi method is the most elegant as it can provide clean notices independent of the package manager. If performance is valued more the parser approach is an excellent compromise in terms of compatibility and efficiency. It does not change the semantics of inherit while still allowing quite large changes. GLEP55 might offer the largest flexibility, but that comes at a high price. It has been very controversial, so implementing it also has a high social price. There are many points in GLEP55 that have not been defined well enough so that it is not a good base to start from. As it seems to cover the widest range of issues (performance, compatibility, subjective aesthetics, simplicity) we suggest implementing a parser-based approach. Versioning changes - ideas ========================== One of the potential issues mentioned in GLEP55 is the option to extend the version syntax. One example for such extensions is GLEP54. Advantages: - More expressive syntax for special cases (live ebuilds) - Adding more syntactic sugar (-rc instead of _rc) Disadvantages - more rules, more complex - per-eapi versioning can potentially be inconsistent (either force strict subset/superset between EAPIs or risk incomparable versions) GLEP55 is the only proposal that allows easy per-eapi changes of versioning rules. One downside is that a priori no limitations of eapis exist, so adding random formats that are unreadable by the official package manager(s) is possible and cannot be detected by QA tools. Also the versioning rules per EAPI can be changed to arbitrary things, so keeping the rules consistent still needs a defined process that could be used to change the global rules instead. All other proposals disallow extending the versioning rules as it would potentially break older package managers that are not aware of that versioning scheme yet. One way to still change the versioning would be a freeze of the current repository and migrating to a new repository location on each incompatible change. This would allow an upgrade path to the last state before that change (with package managers that know how to handle the new format available) for all users without ever exposing "new-format" ebuilds to older package managers. Versioning changes should be discussed independently of the EAPI / ebuild sourcing problem complex. See glep54 and lu_zero's proposal. References ========== .. [#GLEP55] GLEP 55, Use EAPI-suffixed ebuilds (.ebuild-EAPI) (http://glep.gentoo.org/glep-0055.html) .. [#GLEP54] GLEP 54, scm package version suffix (http://glep.gentoo.org/glep-0054.html) .. [#lu_zero] glep54 counterproposal (http://dev.gentoo.org/~lu_zero/glep/liveebuild.rst) .. [#haubi] glep55 counterproposal (http://archives.gentoo.org/gentoo-dev/msg_348f84690f03e597fb14d6602337c45f.xml)