commit: 1b0bc669207e1d14d3754eec79b940c7f494b748 Author: Brian Harring <ferringb <AT> gmail <DOT> com> AuthorDate: Thu Nov 27 12:47:49 2025 +0000 Commit: Brian Harring <ferringb <AT> gmail <DOT> com> CommitDate: Thu Nov 27 19:53:09 2025 +0000 URL: https://gitweb.gentoo.org/proj/pkgcore/pkgcore.git/commit/?id=1b0bc669
wipe historical notes: how we created EAPI, the reasons of the design and things shaped The pkgcore ecosystem is now on it's ~4th generation of developers and owners. This commits wipes the last real traces of the first gen, however I will commit into history the importance of their contributions- they were involved in getting this pig off the ground, and giving it wings. If you're reading this, it's due to no small part of their contributions, and I hope 20y later if you put their level of effort into the work, someone does similar for you. It's probably only of historical interest now- I highly doubt anyone reading these files would understand the context, why things were designed this way, why EBD mattered, etc. EBD for example, much like pkgcheck's pipeline, was criticized as overly complex, overly engineered, "astronautical architectural design", etc. I mention this as the context of what the early devs pushed through. Definitely some of the designs flopped, andt the code execution that hasn't been rebuilt into py3k is frankly god awful since it was written compatible to py2k old style, py2k new style, and py3k. Snakeoil got it's name for a reason. EBD was built to seperate the ebuild execution state from the package manager so that we could *strictly* run against the versioned ebuild format the ebuild adhered to. The nasty part is the state saving and reloading, a mechanism we built and then retrofitted into portage; this is how we created EAPI, formalized into EAPI0. There's a lot of other inflections points pkgcore dev work created in gentoo; for example, the last 2 cache formats everyone uses is ours (the git repo cache in particular), versioning mechanisms, etc. What was built generally was donated into portage where possible- at one point 25% of pkgcore got retrofitted into portage to replace multiple subsystems for example. Both being helpful, but it also created standardization to move things forward. On to it: * Jason Stubbs- jstubbs- resolver design theory, leading to the restriction subsystem being built around CNF/DNF. * Marien Zwart- marienz. My pkgcheck pipeline design he turned into what got people caring about pkgcheck. He did a helluva lot more than that across the entire ecosystem. Folks wouldn't have cared about pkgcheck without his work to get the snowball rolling down the hill. * Marius Mauch- genone- deeply critical reviwer of pkgcore designs, specifically catching multiple intrinsic aspects of the architectural design that emergently would lead to repeating aspects of *why* I wrote this in the first place Their work is stil appreciated. Signed-off-by: Brian Harring <ferringb <AT> gmail.com> doc/dev-notes.rst | 19 - doc/dev-notes/TODO.rst | 106 ----- doc/dev-notes/changes.rst | 21 - doc/dev-notes/config.rst | 389 ------------------ doc/dev-notes/developing.rst | 77 ---- doc/dev-notes/eapi.rst | 81 ---- doc/dev-notes/feature-breakdown.rst | 86 ---- doc/dev-notes/formats/cpan.rst | 9 - doc/dev-notes/formats/dpkg.rst | 47 --- doc/dev-notes/framework/intro.rst | 732 ---------------------------------- doc/dev-notes/fs-ops.rst | 30 -- doc/dev-notes/hacking.rst | 621 ---------------------------- doc/dev-notes/harring-notes.rst | 32 -- doc/dev-notes/heapy.rst | 497 ----------------------- doc/dev-notes/plugins.rst | 118 ------ doc/dev-notes/portage-differences.rst | 88 ---- doc/dev-notes/tackling-domain.rst | 45 --- doc/dev-notes/tests.rst | 34 -- 18 files changed, 3032 deletions(-) diff --git a/doc/dev-notes.rst b/doc/dev-notes.rst deleted file mode 100644 index 1cd6bc4e..00000000 --- a/doc/dev-notes.rst +++ /dev/null @@ -1,19 +0,0 @@ -Developer Notes -=============== - -These are the original docs written for pkgcore, detailing some of it's -architecture, intentions, and reasons behind certain designs. - -Currently, the docs aren't accurate- this will be corrected moving forward. - -Right now they're primarily useful from a background-info standpoint. - -Content -------- - -.. toctree:: - :glob: - :maxdepth: 2 - - dev-notes/* - dev-notes/*/* diff --git a/doc/dev-notes/TODO.rst b/doc/dev-notes/TODO.rst deleted file mode 100644 index 43803f07..00000000 --- a/doc/dev-notes/TODO.rst +++ /dev/null @@ -1,106 +0,0 @@ -========== -Rough TODO -========== - -- rip out use.* code from pkgcheck.addons.UseAddon.__init__, and - generalize it into pkgcore.ebuild.repository - -- not hugely important, but... make a cpython version of SlottedDict from - pkgcore.util.obj; 3% reduction for full repo walk, thus not a real huge - concern atm. - -- userpriv for pebuild misbehaves.. - -- http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/491285 - check into, probably better then my crufty itersort; need to see how - well heapqu's nlargest pop behaves (looks funky) - -- look into converting MULTILIB_STRICT* crap over to a trigger - -- install-sources trigger - -- recreate verify-rdepends also - -- observer objects for reporting back events from merging/unmerging - cpython 'tee' is needed, contact harring for details. - basic form of it is in now, but need something more powerful for - parallelization - elog is bound to this also - -- Possibly convert to cpython: - - - flat_hash.database._parse_data - - metadata.database._parse_data - - posixpath (os.path) - -- get the tree clean of direct /var/db/pkg access - -- vdb2 format (ask harring for details). - -- pkgcore.fs.ops.merge_contents; doesn't rewrite the contents set when a file - it's merging is relying on symlinked directories for the full path; eg, - /usr/share/X11/xkb/compiled -> /var/blah, it records the former instead of - recording the true absolute path. - -- pmerge mods; [ --skip-set SET ] , [ --skip atom ], use similar restriction - to --replace to prefer vdb for matching atoms - -- refactor pkgcore.ebuild.cpv.ver_cmp usage to avoid full cpv parsing when - _cpv is in use; - 'nuff said, look in pkgcore.ebuild.cpv.cpy_ver_cmp - -- modify repository.prototype.tree.match to take an optional comparison - - reasoning being that if we're just going to do a max, pass in the max so it - has the option of doing the initial sorting without passing through - visibility filters (which will trigger metadata lookups) - -- 'app bundles'. Reliant on serious overhauling of deps to do 'locked deps', - but think of it as rpath based app stacks, a full apache stack compiled to - run from /opt/blah for example. - -- pkgcore.ebuild.gpgtree - - derivative of pkgcore.ebuild.ebuild_repository, this overloads - ebuild_factory and eclass_cache so that gpg checks are done. - This requires some hackery, partially dependent on config.central changes - (see above). Need a way to specify the trust ring to use, 'severity' level - (different class targets works for me). - Anyone who implements this deserves massive cookies. - -- pkgcore.ebuild.gpgprofile: - Same as above. - -- reintroduce locking of certain high level components using read/write; - mainly, use it as a way to block sync'ing a repo that's being used to build, - lock the vdb for updates, etc. - -- preserve xattrs when merging files to properly support hardened profiles - -- support standard emerge.log output so tools such as qlop work properly - -- add FEATURES=parallel-fetch support for downloading distfiles in the - background while building pkgs, possibly extend to support parallel downloads - -- apply repo masks to related binpkgs (or handle masks somehow) - -- remove deprecated PROVIDE and old style virtuals handling - -- add argparse support for checking the inputted phase name with pebuild to - make sure it exists, currently nonexistent input cause unhandled exceptions - -- support repos.conf (SYNC is now deprecated) - -- make profile defaults (LDFLAGS) override global settings from - /usr/share/portage/config/make.globals or similar then apply user settings on - top, currently LDFLAGS is explicitly set to an empty string in make.globals - but the profile settings aren't overriding that - -- support /etc/portage/mirrors - -- support ACCEPT_PROPERTIES and /etc/portage/package.properties - -- support ACCEPT_RESTRICT and /etc/portage/package.accept_restrict - -- support pmerge --info (emerge --info workalike), requires support for - info_vars and info_pkgs files from profiles diff --git a/doc/dev-notes/changes.rst b/doc/dev-notes/changes.rst deleted file mode 100644 index ced45c57..00000000 --- a/doc/dev-notes/changes.rst +++ /dev/null @@ -1,21 +0,0 @@ -========= - Changes -========= - -(Note that this is not a complete list) - -* Proper env saving/reloading. The ebuild is sourced once, and run from the env. -* DISTDIR has indirection now. It points at a directory, ie, symlinks. - to the files. The reason for this is to prevent builds from lying about their - sources, leading to less bugs. -* PORTAGE_TMPDIR is no longer in the ebuild env. -* (PORTAGE_|)BUILDDIR is no longer in the ebuild env. -* BUILDPREFIX is no longer in the ebuild env. -* AA is no longer in the ebuild env. -* inherit is an error in phases except for setup, prerm, and postrm. - pre/post rm are allowed only in order to deal with broken envs. Running - config with a broken env isn't allowed, because config won't work; - installing with a broken env is not allowed because preinst/postinst - won't be executed. -* binpkg building now gets the unmodified contents- thus when merging a - binpkg, all files are there unmodified. diff --git a/doc/dev-notes/config.rst b/doc/dev-notes/config.rst deleted file mode 100644 index 5d7f2af0..00000000 --- a/doc/dev-notes/config.rst +++ /dev/null @@ -1,389 +0,0 @@ -===================================== - Config use and implementation notes -===================================== - -Using the manager -================= - -Normal use ----------- - -To get at the user's configuration:: - - from pkgcore.config import load_config - config = load_config() - main_repo = config.get_default('repo') - spork_repo = config.repo['spork'] - -Usually this is everything you need to know about the manager. Some -things to be aware of: - -- Some of the managed sources of configuration data may be slow, so - accessing a source is delayed for as long as possible. Some things - require accessing all sources though and should therefore be - avoided. The easiest one to trigger is config.repo.keys() or the - equivalent list(config.sections('repo')). This has to get the - "class" setting for every available config section, which might be - slow. -- For the same reason the manager does not know what type names exist - (there is no hardcoded list of them, so the only way to get that - information would be loading all config sections). This is why you - can get this:: - - >>> load_config().section('repo') # typo, should be "sections" - Traceback (most recent call last): - File "<stdin>", line 1, in ? - TypeError: '_ConfigMapping' object is not callable - - This constructed a dictlike object for accessing all config sections - of the type "section", then tried to call it. - -Testcase use ------------- - -For testing of high-level scripts it can be useful to construct a -config manager containing hardcoded values:: - - from pkgcore.config import basics, central - - config = central.ConfigManager([{ - 'repo' = basics.HardCodedConfigSection({'class': my_repo, - 'data': ['1', '2']}), - 'cont' = basics.ConfigSectionFromStringDict({'class': 'pkgcore.my.cont', - 'ref': 'repo'}), - }]) - -What this does should be fairly obvious. Be careful you do not use the -same ConfigSection object in more than one place: caching will not -behave the way you want. See `Adding a config source`_ for details. - -Adding a configurable -===================== - -You often do not really *have* to do anything to make something a -valid "class" value, but it is clearer and it is necessary in certain -cases. - -Adding a class --------------- - -To make a class available, do this:: - - from pkgcore.config import ConfigHint, errors - - class MyRepo: - - pkgcore_config_type = ConfigHint({'cache': 'section_ref'}, - typename='repo') - - def __init__(self, repo): - try: - self.initialize(repo) - except SomeRandomException: - raise errors.InstantiationError('eep!') - -The first ConfigHint arg tells the config system what kind of -arguments you take. Without it it assumes arguments with no default -are strings and guesses for the other args based on the type of the -default value. So if you have no default values or they are just None -you should tell the system about your args. - -The second one tells it you fulfill the repo "protocol", meaning your -instances will show up in load_config().repo. - -ConfigHint takes some more arguments, see the api docs for details. - -Adding a callable ------------------ - -To make a callable available you can do this:: - - from pkgcore.config.hint import configurable - - @configurable({'cache': 'section_ref'}, typename=repo) - def my_repo(repo): - # do stuff - -configurable is just a convenience function that applies a ConfigHint. - -Exception handling ------------------- - -If you raise an exception when the config system calls you it will -catch the exception and wrap it in an InstantiationError. This is good -for calling code since catching and printing those provides the user -with a readable description of what happened. It is less good for -developers since the raising of a new exception kills the traceback -printed in debug mode. You will have a traceback that "ends" in the -config code handling instantiation. - -You can improve this by raising an InstantiationError yourself. If you -do this the config system will be able to add the extra information -needed for a user-friendly error message to it without raising a new -exception, meaning debug mode will give a traceback leading right back -to your code raising the InstantiationError. - -Adding a config source -====================== - -Config sources are pretty straightforward: they are mappings from a -section name to a ConfigSection subclass. The only tricky thing is the -combination of section references and caching. The general rule is "do -not expose the same ConfigSection in more than one way". If you do it -will be collapsed and instantiated once for every way it is exposed, -which is usually not what you want. An example:: - - from pkgcore.config import basics - from pkgcore.config.hint import configurable - - def example(): - return object() - - @configurable({'ref': 'section_ref'}) - def nested(ref): - return ref - - multi = basics.HardCodedConfigSection({'class': example}) - - myconf = { - 'multi': multi, - 'bad': basics.HardCodedConfigSection({'class': nested, 'ref': multi}) - 'good': basics.ConfigSectionFromStringDict({'class': 'nested', - 'ref': 'multi'}) - -If you feed this to the ConfigManager and instantiate everything -"multi" and "good" will be identical but "bad" will be a different -object. For an explanation of why this happens see the implementation -notes in the next section. - -You trigger a similar problem if you create a custom ConfigSection -subclass that bypasses central's collapse_named_section for named -section refs. If you somehow get at the referenced ConfigSection and -hand it to collapse_section you will most likely circumvent caching. -Only use collapse_section for unnamed sections. - -ConfigManager tries not to extract more things from this mapping than -it has to. Specifically, it will not call __getitem__ before it needs -to instantiate the section or needs to know its type. However it -*will* iterate over the keys (section names) immediately to find -autoloads. If this is a problem (getting those names is slow) then -make sure the manager knows your config is "remote". - -Implementation notes -==================== - -This code has evolved quite a bit over time. The current code/design -tries among other things to: - -- Allow sections to contain both named and nameless/inline references - to other sections. -- Allow serialization of the loaded config. -- Not do unnecessary work (if possibly not recollapse configs, - definitely not trigger unnecessary imports, access configs - unnecessarily, reinstantiate configs) -- Provide both end-user error messages that are complete enough to - track down a problem in a complex nested config and tracebacks that - reach back to actual buggy code for developers. - -Overview from load_config() to instantiated repo ------------------------------------------------- - -When you call load_config() it looks up what config files are available -(/etc/pkgcore/pkgcore.conf, ~/.config/pkgcore/pkgcore.conf, -/etc/portage/make.conf) and loads them. This produces a dict mapping section -names to ConfigSection instances. For the ini-format pkgcore.conf files this is -straightforward, for make.conf this is a lot of work done in -pkgcore.ebuild.portage_conf. I'm not going to describe that module here, read -the source for details. - -The ConfigSections have a pretty straightforward api: they work like -dicts but get passed a string describing what "type" the value should -be and a central.ConfigManager instance for reasons described later. -Passing in this "type" string when getting the value is necessary -because the way things like lists of strings are stored depends on the -format of the configuration file but the parser does not have enough -information to know it should parse as a list instead of a string. For -example, an ini-format pkgcore.conf could contain:: - - [my-overlay-cache] - class=pkgcore.cache.flat_hash.database - auxdbkeys=DEPEND RDEPEND - -We want to turn that auxdbkeys value into a list of strings in the ini -file parser code instead of in the central.ConfigManager or even -higher up because more exotic config sections may want to store this -in a different way (perhaps as a comma-separated list, or even as -"<el>DEPEND</el><el>RDEPEND</el>". But there is obviously not enough -information in the ini file for the parser to know this is meant as a -list instead of a string with a space in it. - -central.ConfigManager gets instantiated with one or more of those -dicts mapping section names to ConfigSections. They're split up into -normal and "remote" configs which I'll describe later, let's assume -they're all "remote" for now. In that case no work is done when the -ConfigManager is instantiated. - -Getting an actual configured object out of the ConfigManager is split -in two phases. First the involved config sections are "collapsed": -inherits are processed, values are converted to the right type, -presence of required arguments is checked, etc. Everything up to -actually instantiating the target class and actually instantiating any -section references it needs. The result of this work is bundled in a -CollapsedConfig instance. Actual instantiation is handled by the -CollapsedConfig instance. - -The ConfigManager manages CollapsedConfig instances. It creates new -ones if required and makes sure that if a cached instance is available -it is used. - -For the remainder of the example let's assume our config looks like -this:: - - [spork] - inherit=cache - auxdbkeys=DEPEND RDEPEND - - [cache] - class=pkgcore.cache.flat_hash.database - -Running config.repo['spork'] runs -config.collapse_named_section('spork'). This first checks if this -section was already collapsed and returns the CollapsedConfig if it is -available. If it is not in the cache it looks up the ConfigSection -with that name in the dicts handed to the ConfigManager on -instantiation and calls collapse_section on it. - -collapse_section first recursively finds any inherited sections (just -the "cache" section in this case). It then grabs the 'class' setting -(which is always of type 'callable'). In this case that's -"pkgcore.cache.flat_hash.database", which the ConfigSection imports -and returns. This is then wrapped in a config.basics.ConfigType. A -ConfigType contains the information necessary to validate arguments -passed to the callable. It uses the magic pkgcore_config_type -attribute if the callable has it and introspection for everything -else. In this case -pkgcore.cache.flat_hash.database.pkgcore_config_type is a ConfigHint -stating the "auxdbkeys" argument is of type "list". - -Now that collapse_section has a ConfigType it uses it to retrieve the -arguments from the ConfigSections and passes the ConfigType and -arguments to CollapsedConfig's __init__. Then it returns the -CollapsedConfig instance to collapse_named_section. -collapse_named_section caches it and returns it. - -Now we're back in the __getattr__ triggered by config.repo['spork']. -This checks if the ConfigType on the CollapsedConfig is actually -'repo', and returns collapsedConfig.instantiate() if this matches. - -Lazy section references ------------------------ - -The main reason the above is so complicated is to support various -kinds of references to other sections. Example:: - - [spork] - class=pkgcore.Spork - ref=foon - - [foon] - class=pkgcore.Foon - -Let's say pkgcore.Spork has a ConfigHint stating the type of the "ref" -argument is "lazy_ref:foon" (lazy reference to a foon) and its typename is -"repo", and pkgcore.Foon has a ConfigHint stating its typename is -"foon". a "lazy reference" is an instance of basics.LazySectionRef, -which is an object containing just enough information to produce a -CollapsedConfig instance. This is not the most common kind of -reference, but it is simpler from the config point of view so I'm -describing this one first. - -When collapse_section runs on the "spork" section it calls -section.get_value(self, 'ref:repo', 'section_ref'). "lazy_ref" in the -type hint is converted to just "ref" here because the ConfigSections -do not have to distinguish between lazy and "normal" references. -Because this particular ConfigSection only supports named -references it returns a LazyNamedSectionRef(central, 'ref:repo', -'foon'). This just gets handed to Spork's __init__. If the Spork -decides to call instantiate() on the LazyNamedSectionRef it calls -central.collapse_named_section('foon'), checks if the result is of -type foon, instantiates it and returns it. - -The same thing using a dhcp-style config:: - - spork { - class pkgcore.Spork; - ref { - class pkgcore.Foon; - }; - } - -In this format the reference is an inline unnamed section. When -get_value(central, 'ref:repo', 'foon') is called it returns a -LazyUnnamedSectionRef(central, 'ref:repo', section) where section is a -ConfigSection instance for the nested section (knowing just that -"class" is "pkgcore.Foon" in this case). This is handed to -Spork.__init__. If Spork calls instantiate() on it it calls -central.collapse_section(self.section) and does the same type checking -and instantiating LazyNamedSectionRef did. - -Notice neither Spork nor ConfigManager care if the reference is inline -or named. get_value just has to return a LazySectionRef instance -(LazyUnnamedSectionRef and LazyNamedSectionRef are subclasses of -this). How this actually gets a referenced config section is up to the -ConfigSection whose get_value gets called. - -Normal section references -------------------------- - -If Spork's ConfigHint defines the type of its "ref" argument as -"ref:foon" instead of "lazy_ref:foon" it gets handed an actual Foon -instance instead of a LazySectionRef to one. This is built on top of -the lazy reference code. For the ConfigSections nothing changes (the -same get_value call is made). But the ConfigManager now immediately -calls collapse() on the LazySectionRef, retrieving a CollapsedConfig -instance (for the "foon"). This is handed to the CollapsedConfig for -"spork", and when this one is instantiated the referenced -CollapsedConfig is also instantiated. - -Miscellaneous details ---------------------- - -The support for nameless sections means neither ConfigSection nor -CollapsedConfig have a name attribute. This makes the error handling -code a bit tricky as it has to tag in the name at various points, but -this works better than enforcing names where it does not make sense -(means lots of unnecessary duplication of names when dealing with -dicts of HardCoded/StringBasedConfigSections). - -The suppport for serialization of the loaded config means section_refs -cannot be instantiated straight away. The object used for -serialization is the CollapsedConfig which gives you both the actual -values and the type they have. If the CollapsedConfig contained -arbitrary instantiated objects serializing them would be impossible. -So it contains nested CollapsedConfigs instead. - -Not doing unnecessary work is done by caching in two places. The -simple one is CollapsedConfig caching its instantiated value. This is -pretty straightforward. The more subtle one is ConfigManager caching -CollapsedConfigs by name. It is obviously a good idea to cache these -(if we didn't we would have to cache the instantiated value in the -ConfigManager). An alternative would be caching them by their -ConfigSection. This has the minor disadvantage of keeping the -ConfigSection in memory, and the larger one that it may break caching -for weird config sources that generate ConfigSections on demand. The -downside of caching by name is we have to make sure nothing generates -a CollapsedConfig for a named section in a way other than -collapse_named_section (handing the ConfigSection to collapse_section -bypasses caching). - -This means a ConfigSection cannot return a raw ConfigSection from a -section_ref get_value call. If it was a ConfigSection that central -then collapsed and the reference was actually to a named section -caching is bypassed. - -The need for a section name starting with "autoload" is also there to -avoid unnecessary work. Without this we would have to figure out the -typename of every section. While we can do that without entirely -collapsing the config we cannot avoid importing the "class", which -means load_config() would import most of pkgcore. That should -definitely be avoided. diff --git a/doc/dev-notes/developing.rst b/doc/dev-notes/developing.rst deleted file mode 100644 index 370b9f09..00000000 --- a/doc/dev-notes/developing.rst +++ /dev/null @@ -1,77 +0,0 @@ -========================= - Checking the source out -========================= - -If you're just installing pkgcore from a released tarball, skip this section. - -To get the current (development) code with history, install git -(``emerge git`` on gentoo) and run:: - - git clone git://github.com/pkgcore/pkgcore.git - -==================== - Installing pkgcore -==================== - -Set PYTHONPATH -============== - -If you only want to run scripts from pkgcore itself (the ones in its -"bin" directory) you do not have to do anything with PYTHONPATH. If -you want to use pkgcore from an interactive python interpreter session -you do not have to do anything if you start the interpreter from the -"root" of the pkgcore source tree. For other uses you probably want to -set PYTHONPATH to include your pkgcore directory, so that python can -find the pkgcore code. For example:: - - $ export PYTHONPATH="${PYTHONPATH}:/home/user/pkgcore/" - -Now test to see if it works:: - - $ python -c 'import pkgcore' - -Python will scan pkgcore, see the pkgcore directory in it (and that it has -__init__.py), and use that. - - -Registering plugins -=================== - -Pkgcore uses plugins for some basic functionality. You do not really -have to do anything to get this working, but things are a bit faster -if the plugin cache is up to date. This happens automatically if the -cache is stale and the user running pkgcore may write there, but if -pkgcore is installed somewhere system-wide and you only run it as user -you can force a regeneration with:: - - # pplugincache - -If you want to update plugin caches for something other than pkgcore's -core plugin registry, pass the package name as an argument. - -Test pkgcore -============ - -Drop back to normal user, and try:: - - $ python - >>> import pkgcore.config - >>> from pkgcore.ebuild.atom import atom - >>> conf=pkgcore.config.load_config() - >>> tree=conf.get_default('domain').repos[1] - >>> pkg=max(tree.itermatch(atom("dev-util/diffball"))) - >>> print pkg - >>> print pkg.depends - >=dev-libs/openssl-0.9.6j >=sys-libs/zlib-1.1.4 >=app-arch/bzip2-1.0.2 - - -At the time of writing the domain interface is in flux, so this example might -fail for you. - -Build extensions -================ - -If you want to run pkgcore from its source directory but also want the -extra speed from the compiled extension modules, compile them in place:: - - $ python setup.py build_ext -i diff --git a/doc/dev-notes/eapi.rst b/doc/dev-notes/eapi.rst deleted file mode 100644 index c6024e06..00000000 --- a/doc/dev-notes/eapi.rst +++ /dev/null @@ -1,81 +0,0 @@ -=========== -Ebuild EAPI -=========== - - -This should hold the proposed (with a chance of making it in), accepted, and -implemented changes for ebuild format version 1. A version 0 doc would also -be a good idea ( no one has volunteered thus far ). - -Version 0 (or undefined eapi, <=portage-2.0.52*)] -************************************************* - -Version 1 -********* - -This should be fairly easy stuff to implement for the package manager, -so this can actually happen in a fairly short timeframe. - -- EAPI = 1 required -- src_configure phase is run before src_compile. If the ebuild or - eclass does not override there is a default that does nothing. - Things like econf should be run in this phase, allowing rerunning - the build phase without rerunning configure during development. -- Make the default implementation of phases/functions available under - a second name (possibly using EXPORT_FUNCTIONS) so you can call - base_src_compile from your src_compile. -- default src_install. Exactly what goes in needs to be figured out, - see bug 33544. -- RDEPEND="${RDEPEND-${DEPEND}}" is no longer set by portage, same for eclass. -- (proposed) BDEPEND metadata addition, maybe. These are the - dependencies that are run on the build system (toolchain, autotools - etc). Useful for ROOT != "/". Probably hard to get right for ebuild - devs who always have ROOT="/". -- default IUSE support, IUSE="+gcj" == USE="gcj" unless the user disables it. -- GLEP 37 ("Virtuals Deprecation"), maybe. The glep is "deferred". How - much of this actually needs to be done? package.preferred? -- test depend, test src_uri (or represent test in the use namespace - somehow). Possibilities: TEST_{SRC_URI,{B,R,}DEPEND}, test "USE" - flag getting set by FEATURES=test. -- drop AA (unused). -- represent in metadata if the pkg needs pkg_preinst to have access to - ${D} or not. If this is not required a binpkg can be unpacked - straight to root after pkg_preinst. If pkg_preinst needs access to - ${D} the binpkg is unpacked there as usual. -- use groups in some form (kill use_expand off). -- ebuilds can no longer use PORTDIR and ECLASSDIR(s); they break any - potential remote, and are dodgey as all hell for multiple repos - combined together. -- disallow direct access to /var/db/pkg -- deprecate ebuild access/awareness of PORTAGE_* vars; perl ebuilds - security fix for PORTAGE_TMPDIR (rpath stripping in a way) might - make this harder. -- use/slot deps, optionally repository deps. -- hard one to slide in, but change versioning rules; no longer allow - 1.006, require it to be 1.6 -- pkg_setup must be sandboxable. -- allowed USE conditional configurations; new metadata key, extend - depset syntax to include xor, represent allowed configurations. -- true incremental stacking support for metadata keys between - eclasses/ebuilds; RESTRICT=-strip for example in the ebuild. -- drop -* from keywords; it's package.masking, use that instead (-arch - is acceptable although daft) -- blockers aren't allowed in PDEPEND (the result of that is serious - insanity for resolving) - -Version 1+ -********** - -Not sure about these. Maybe some can go into version 1, maybe they -will happen later. - -- Elibs -- some way to 'bind' a rdep/pdep so that it's explicit "I'm locked - against the version I was compiled against" -- some form of optional metadata specifying that a binpkg works on - multiple arches, iow it doesn't rely on compiled components. -- A way to move svn/cvs/etc source fetching over to the package - manager. The current way of doing this through an eclass is a bit - ugly since it requires write access to the distdir. Moving it to the - package manager fixes that and allows integrating it with things - like parallel fetch. This needs to be fleshed out. diff --git a/doc/dev-notes/feature-breakdown.rst b/doc/dev-notes/feature-breakdown.rst deleted file mode 100644 index 080a0fc5..00000000 --- a/doc/dev-notes/feature-breakdown.rst +++ /dev/null @@ -1,86 +0,0 @@ -=============================== - Feature (FEATURES) categories -=============================== - -relevant list of features -========================= - -* autoaddcvs -* buildpkg -* ccache -* collision-protect -* confcache -* cvs -* digest -* distcc -* distlocks -* fixpackages -* getbinpkg -* gpg -* keeptemp -* keepwork -* mirror -* noclean (keeptemp, keepwork) -* nodoc -* noinfo -* noman -* nostrip -* notitles -* sandbox -* severe -* severer (dumb spanky) -* sfperms -* sign -* strict -* suidctl -* test -* userpriv -* usersandbox - -Undefined ---------- - -fixpackages - -Dead ----- - -* usersandbox -* noclean -* getbinpkg (it's a repo type, not a global feature) -* buildpkg (again, repo thing. moreso ui/buildplan execution) - -Build ------ - -* keeptemp, keepwork, noclean, ccache, distcc -* sandbox, userpriv -* confcache -* noauto (fun one) -* test - -repos or wrappers ------------------ - -Mutables -~~~~~~~~ - -* autoaddcvs -* cvs -* digest -* gpg -* no{doc,info,man,strip} -* sign -* sfperms -* collision-protect (vdb only) - -Immutables -~~~~~~~~~~ - -* strict -* severe ; these two are repository opts on gpg repo class - -Fetchers -~~~~~~~~ - -* distlocks, sort of. diff --git a/doc/dev-notes/formats/cpan.rst b/doc/dev-notes/formats/cpan.rst deleted file mode 100644 index fdf73505..00000000 --- a/doc/dev-notes/formats/cpan.rst +++ /dev/null @@ -1,9 +0,0 @@ -=========== - perl CPAN -=========== - -* makeCPANstub in Gentoo/CPAN.pm , dumps cpan config -* screen scraping to get deps, example page http://kobesearch.cpan.org/, - use getCPANInfo from CPAN -* use FindDeps for this -* use unmemoize(func) to back out the memoizing of a func; do this on FindDeps diff --git a/doc/dev-notes/formats/dpkg.rst b/doc/dev-notes/formats/dpkg.rst deleted file mode 100644 index 6f36fe99..00000000 --- a/doc/dev-notes/formats/dpkg.rst +++ /dev/null @@ -1,47 +0,0 @@ -====== - dpkg -====== - -this is just *basic* notes, nothing more. If you know details, fill in -the gaps kindly - -repos are combined. - -Sources.gz - (list of source based deb's) holds name, version, and build deps. - -Packages.gz - (binary debs, dpkgs) - name, version, size, short and long description, bindeps. - -repository layout:: - - dists - stable - main - arch #binary-arm fex - source #? - contrib #? - arch # binary-arm fex - source - non-free # guess. - arch - source - testing... - unstable... - -arch/binary-* dirs hold Packages.gz, and Release (potentially) -source dirs hold Sources.gz and Release (optionally) - -has preinst, postinst, prerm, postrm -Same semantics as ebuilds in terms of when to run (coincidence? :) - -============== ========================================== -in dpkg in ebuild -============== ========================================== -Build-Depends our DEPEND -Depends our RDEPEND -Pre-Depends configure time DEPEND -Conflicts blockers, affected by Essential - (read up on this in debian policy guide) -============== ========================================== \ No newline at end of file diff --git a/doc/dev-notes/framework/intro.rst b/doc/dev-notes/framework/intro.rst deleted file mode 100644 index d829181b..00000000 --- a/doc/dev-notes/framework/intro.rst +++ /dev/null @@ -1,732 +0,0 @@ -======= -WARNING -======= - -This is the original brain dump from harring; it *is* not guaranteed to -be accurate to the current design, it's kept around to give an idea -of where things came from to contrast to what is in place now. - - -============== - Introduction -============== - -e'yo. General description of layout/goals/info/etc, and semi sortta api. - -That and aggregator of random ass crazy quotes should people get bored. - -**DISCLAIMER** - -This ain't the code. - -In other words, the actual design/code may be radically different, and -this document probably will trail any major overhauls of the -design/code (speaking from past experience). - -Updates welcome, as are suggestions and questions- please dig through -all documentations in the dir this doc is in however, since there is a -lot of info (both current and historical) related to it. Collapsing -info into this doc is attempted, but explanation of the full -restriction protocol (fex) is a *lot* of info, and original idea is -from previous redesign err... designs. Short version, historical, but -still relevant info for restriction is in layout.txt. Other -subsystems/design choices have their basis quite likely from other -docs in this directory, so do your homework please :) - -Terminology -=========== - -cp - category/package - -cpv - category/package-version - -ROOT - livefs merge point, fex /home/bharring/embedded/arm-target or - more commonly, root=/ - -vdb - /var/db/pkg, installed packages database. - -domain - combination of repositories, root, and build information (use - flags, cflags, etc). config data + repositories effectively. - -repository - trees. ebuild tree, binpkg tree, vdb tree, etc. - -protocol - python name for design/api. iter() fex, is a protocol; for iter(o) - it does i=o.__iter__(); the returned object is expected to yield an - element when i.next() is called, till it runs out of elements (then - throwing a StopIteration). - hesitate to call it defined hook on a class/instance, but this - (crappy) description should suffice. - -seq - sequence, lists/tuples - -set - list without order (think dict.keys()) - -General design/idea/approach/requirements -========================================= - -All pythonic components installed by pkgcore *must* be within -pkgcore.* namespace. No more polluting python's namespace, plain and -simple. Third party plugins to pkgcore aren't bound by this however -(their mess, not ours). - -API flows from the config definitions, *everything* internal is -effectively the same. Basically, config data gives you your starter -objects which from there, you dig deeper into the innards as needed -action wise. - -The general design is intended to heavily abuse OOP. -Further, delegation of actions down to components *must* be abided by, -example being repo + cache interaction. repo does what it can, but for -searching the cache, let the cache do it. Assume what you're -delegating to knows the best way to handle the request, and probably -can do its job better then some external caller (essentially). - -Actual configuration is pretty heavily redesigned. Classes and -functions that should be constructed based on data from the user's -configuration have a "hint" describing their arguments. The global -config class uses these hints to convert and typecheck the values in -the user's configuration. Actual configuration file reading and type -conversion is done by a separate class, meaning the global manager is -not tied to a single format, or even to configuration read from a file -on disk. - -Encapsulation, extensibility/modularity, delegation, and allowing -parallelizing of development should be key focuses in -implementing/refining this high level design doc. Realize -parallelizing is a funky statement, but it's apt; work on the repo -implementations can proceed without being held up by cache work, and -vice versa. - -Final comment re: design goals, defining chunks of callable code and -plugging it into the framework is another bit of a goal. Think -twisted, just not quite as prevalent (their needs/focus is much -different from ours, twisted is the app, your code is the lib, vice -versa for pkgcore). - -Back to config. Here's general notion of config 'chunks' of the -subsystem, (these map out to run time objects unless otherwise stated):: - - domain - +-- profile (optional) - +-- fetcher (default) - +-- repositories - +-- resolver (default) - +-- build env data? - | never actually instantiated, no object) - \-- livefs_repo (merge target, non optional) - - repository - +-- cache (optional) - +-- fetcher (optional) - +-- sync (optional, may change) - \-- sync cache (optional, may chance) - - profile - +-- build env? - +-- sets (system mainly). - \-- visibility wrappers - -domain is configuration data, accept_(license|keywords), use, cflags, -chost, features, etc. profile, dependent on the profile class you -choose is either bound to a repository, or to user defined location on -disk (/etc/portage/profile fex). Domain knows to do incremental crap -upon profile settings, lifting package.* crap for visibility wrappers -for repositories also. - -repositories is pretty straightforward. portdir, binpkg, vdb, etc. - -Back to domain. Domain's are your definition of pretty much what can -be done. Can't do jack without a domain, period. Can have multiple -domains also, and domains do *not* have to be local (remote domains -being a different class type). Clarifying, think of 500 desktop boxes, -and a master box that's responsible for managing them. Define an -appropriate domain class, and appropriate repository classes, and have -a config that holds the 500 domains (representing each box), and you -can push updates out via standard api trickery. In other words, the -magic is hidden away, just define remote classes that match defined -class rules (preferably inheriting from the base class, since -isinstance sanity checks will become the norm), and you could do -emerge --domain some-remote-domain -u glsa on the master box. Emerge -won't know it's doing remote crap. Pkgcore won't even. It'll just load -what you define in the config. - -Ambitious? Yeah, a bit. Thing to note, the remote class additions will -exist outside of pkgcore proper most likely. Develop the code needed -in parallel to fleshing pkgcore proper out. - -Meanwhile, the remote bit + multiple domains + class overrides in -config definition is _explicitly_ for the reasons above. That and -x-compile/embedded target building, which is a bit funkier. - -Currently, portage has DEPEND and RDEPEND. How do you know what needs -be native to that box to build the package, what must be chost atoms? -Literally, how do you know which atoms, say the toolchain, must be -native vs what package's headers/libs must exist to build it? We need -an additional metadata key, BDEPEND (build depends). - -If you have BDEPEND, you know what actually is ran locally in building -a package, vs what headers/libs are required. Subtle difference, but -BDEPEND would allow (with a sophisticated depresolver) toolchain to be -represented in deps, rather then the current unstated dep approach -profiles allow. - -Aside from that, BDEPEND could be used for x-compile via inter-domain -deps; a ppc target domain on a x86 box would require BDEPEND from the -default domain (x86). So... that's useful. - -So far, no one has shot this down, moreso, come up with reasons as to -why it wouldn't work, the consensus thus far is mainly "err, don't -want to add the deps, too much work". Regarding work, use indirection. - -virtual/toolchain-c - metapkg (glep37) that expands out (dependent on arch) into whatever - is required to do building of c sources -virtual/toolchain-c++ - same thing, just c++ -virtual/autootols - take a guess. -virtual/libc - this should be tagged into rdepends where applicable, packages that - directly require it (compiled crap mainly) - -Yes it's extra work, but the metapkgs above should cover a large chunk -of the tree, say >90%. - -Config design -============= - -Portage thus far (<=2.0.51*) has had variable ROOT (livefs merge -point), but no way to vary configuration data aside from via a -buttload of env vars. Further, there has been only one repository -allowed (overlays are just that, extensions of the 'master' -repository). Addition of support of any new format is mildly insane -due to hardcoding up the wing wang in the code, and -extension/modification of existing formats (ebuild) has some issues -(namely the doebuild block of code). - -Goal is to address all of this crap. Format agnosticism at the -repository level is via an abstracted repository design that should -supply generic inspection attributes to match other formats. -Specialized searching is possible via match, thus extending the -extensibility of the prototype repository design. - -Format agnosticism for building/merging is somewhat reliant on the -repo, namely package abstraction, and abstraction of building/merging -operations. - -On disk configurations for alternatives formats is extensible via -changing section types, and plugging them into the domain definition. - -Note alt. formats quite likely will never be implemented in pkgcore -proper, that's kind of the domain of pkgcore addons. In other words, -dpkg/rpm/whatever quite likely won't be worked on by pkgcore -developers, at least not in the near future (too many other things to -do). - -The intention is to generalize the framework so it's possible for -others to do so if they choose however. - -Why is this good? Ebuild format has issues, as does our profile -implementation. At some point, alternative formats/non-backwards -compatible tweaks to the formats (ebuild or profile) will occur, and -then people will be quite happy that the framework is generalized -(seriously, nothing is lost from a proper abstracted design, and -flexibility/power is gained). - - -config's actions/operation -========================== - -pkgcore.config.load_config() is the entrance point, returns to you a -config object (pkgcore.config.central). This object gives you access -to the user defined configs, although only interest/poking at it -should be to get a domain object from it. - -domain object is instantiated by config object via user defined -configuration. domains hold instantiated repositories, bind profile + -user prefs (use/accept_keywords) together, and _should_ simplify this -data into somewhat user friendly methods. (define this better). - -Normal/default domain doesn't know about other domains, nor give a -damn. Embedded targets are domains, and _will_ need to know about the -livefs domain (root=/), so buildplan creation/handling may need to be -bound into domains. - - -Objects/subsystems/stuff -======================== - -So... this is general naming of pretty much top level view of things, -stuff emerge would be interested in (and would fool with). hesitate to -call it a general api, but it probably will be as such, exempting any -abstraction layer/api over all of this (good luck on that one }:] ). - - -IndexableSequence ------------------ - -functions as a set and dict, with caching and on the fly querying of -info. mentioned due to use in repository and other places... (it's a -useful lil sucker) - -This actually is misnamed. the order of iteration isn't necessarily -reproducable, although it's usually constant. IOW, it's normally a -sequence, but the class doesn't implicitly force it - - -LazyValDict ------------ - -similar to ixseq, late loading of keys, on fly pulling of values as -requested. - -global config object (from pkgcore.config.load_config()) --------------------------------------------------------- - -see config.rst. - -domain object -------------- - -bit of debate on this one I expect. any package.{mask,unmask,keywords} -mangling is instantiated as a wrapper around repository instances upon -domain instantiation. code *should* be smart and lift any -package.{mask,unmask,keywords} wrappers from repositoriy instances and -collapse it, pointing at the raw repo (basically don't have N -wrappers, collapse it into a single wrapper). Not worth implementing -until the wrapper is a faster implementation then the current -pkgcore.repository.visibility hack though (currently O(N) for each pkg -instance, N being visibility restrictions/atoms). Once it's O(1), -collapsing makes a bit more sense (can be done in parallel however). - -a word on inter repository dependencies... simply put, if the -repository only allows satisfying deps from the same repository, the -package instance's \*DEPEND atom conversions should include that -restriction. Same trickery for keeping ebuilds from depping on -rpm/dpkg (and vice versa). - -.repositories - in the air somewhat on this one. either indexablesequence, or a - repositorySet. Nice aspect of the latter is you can just use .match - with appropriate restrictions. very simply interface imo, although - should provide a way to pull individual repositories/labels of said - repos from the set though. basically, mangle a .raw_repo - indexablesequence type trick (hackish, but nail it down when reach - that bridge) - - -build plan creation -------------------- - -<TODO insert details as they're fleshed out> - -sets ----- - -TODO chuck in some details here. probably defined via user config -and/or profile, although what's it define? atoms/restrictions? -itermatch might be useful for a true set. - - -build/setup operation ---------------------- - -(need a good name for this; dpkg/rpm/binpkg/ebuild's 'prepping' for -livefs merge should all fall under this, with varying use of the -hooks) - -.build() - do everything, calling all steps as needed -.setup() - whatever tmp dirs required, create 'em. -.req_files() - (fetchables, although not necessarily with url (restrict="fetch"...) -.unpack() - guess. -.configure() - unused till ebuild format version two (ya know, that overhaul we've - been kicking around? :) -.compile() - guess. -.test() - guess. -.install() - install to tmp location. may not be used dependent on the format. -.finalize() - good to go. generate (jit?) contents/metadata attributes, or - returns a finalized instance should generate a immutable package instance. - -repo change operation ---------------------- - -base class. - -.package - package instance of what the action is centering around. -.start() - notify repo we're starting (locking mainly, although prerm/preinst - hook also) -.finish() - notify repo we're done. -.run() - high level, calls whatever funcs needed. individual methods are - mainly for ui, this is if you don't display "doing install now... - done... doing remove now... done" stuff. - - -remove operation ----------------- - -derivative of repo change operation. - -.remove() - guess. -.package - package instance of what's being yanked. - -install operation ------------------ - -derivative of repo change operation - -.package - what's being installed. -.install() - install it baby. - -merge operation ---------------- - -derivative of repo remove and install (so it has .remove and .install, -which must be called in .install and .remove order) - -.replacing - package instance of what's being replaced. -.package - what's being installed - -fetchables ----------- - -basically a dict of stuff jammed together, just via attribute access -(think c struct equiv) - -.filename - .. -.url - tuple/list of url's. -.chksums - dict of chksum:val - - -fetcher -------- - -hey hey. take a guess. - -worth noting, if fetchable lacks ``.chksums["size"]``, it'll wipe any -existing file. if size exists, and existing file is bigger, wipe file, -and start anew, otherwise resume. mirror expansion occurs here, also. - -.fetch(fetchable, verifier=None) # if verifier handed in, does -verification. - -verifier --------- - -note this is basically lifted conceptually from mirror_dist. if -wondering about the need/use of it, look at that source. - -verify() - handed a fetchable, either False or True - - -repository ----------- - -this should be format agnostic, and hide any remote bits of it. this -is general info for using it, not designing a repository class - -.mergable() - true/false. pass a pkg to it, and it reports whether it can merge - that or not. -.livefs - boolean, indicative of whether or not it's a livefs target- this is - useful for resolver, shop it to other repos, binpkg fex prior to - shopping it to the vdb for merging to the fs. Or merge to livefs, - then binpkg it while continuing further building dependent on that - package (ui app's choice really). -.raw_repo - either it weakref's self, or non-weakref refs another repo. why is - this useful? visibility wrappers... this gives ya a way to see if - p.mask is blocking usable packages fex. useful for the UI, not too - much for pkgcore innards. -.frozen - boolean. basically, does it account for things changing without - its knowledge, or does it not. frozen=True is faster for ebuild - trees for example, single check for cache staleness. frozen=False - is slower, and is what portage does now (meaning every lookup of a - package, and instantiation of a package instance requires mtime - checks for staleness). -.categories - IndexableSequence, if iterated over, gives ya all categories, if - getitem lookup, sub-category category lookups. think - media/video/mplayer -.packages - IndexableSequence, if iterated over, all package names. if getitem - (with category as key), packages of that category. -.versions - IndexableSequence, if iterated over, all cpvs. if getitem (with - cat/pkg as key), versions for that cp -.itermatch() - iterable, given an atom/restriction, yields matching package - instances. -.match() - ``def match(self, atom): return list(self.itermatch(atom))`` - voila. -.__iter__() - in other words, repository is iterable. yields package instances. -.sync() - sync, if the repo swings that way. flesh it out a bit, possibly - handing in/back ui object for getting updates... - -digressing for a moment... - -note you can group repositories together, think portdir + -portdir_overlay1 + portdir_overlay2. Creation of a repositoryset -basically would involve passing multiple instantiating repo's, and -depending on that classes semantics, it internally handles the -stacking (right most positional arg repo overrides 2nd right most, ... -overriding left most) So... stating it again/clearly if it ain't -obvious, everything is configuration/instantiating of objects, chucked -around/mangled by the pkgcore framework. - -What *isn't* obvious is that since a repository set gets handed -instantiated repositories, each repo, *including* the set instance, -can should be able to have its own cache (this is assuming it's -ebuild repos through and through). Why? Cache data doesn't change for -the most part exempting which repo a cpv is from, and the eclass -stacking. Handled individually, a cache bound to portdir *should* be -valid for portdir alone, it shouldn't carry data that is a result of -eclass stacking from another overlay + that portdir. That's the -business of the repositoryset. Consequence of this is that the -repositoryset needs to basically reach down into the repository it's -wrapping, get the pkg data, *then* rerequest the keys from that ebuild -with a different eclass stack. This would be a bit expensive, although -once inherit is converted to a pythonic implementation (basically -handing the path to the requested eclass down the pipes to -ebuild*.sh), it should be possible to trigger a fork in the inherit, -and note python side that multiple sets of metadata are going to be -coming down the pipe. That should alleviate the cost a bit, but it -also makes multiple levels of cache reflecting each repository -instance a bit nastier to pull off till it's implemented. - -So... short version. Harring is a perfectionist, and says it should be -this way. reality of the situation makes it a bit trickier. Anyone -interested in attempting the mod, feel free, otherwise harring will -take a crack at it since he's being anal about having it work in such -a fashion. - -Or... could do thus. repo + cache as a layer, wrapped with a 'regen' -layer that handles cache regeneration as required. Via that, would -give the repositoryset a way to override and use its own specialized -class that ensures each repo gets what's proper for its layer. Think -raw_repo type trick. - -continuing on... - - -cache ------ - -ebuild centric, although who knows (binpkg cache ain't insane ya -know). short version, it's functionally a dict, with sequence -properties (iterating over all keys). - -.keys() - return every cpv/package in the db. -.readonly - boolean. Is it modifiable? -.match() - Flesh this out. Either handed a metadata restriction (or set of - 'em), or handed dict with equiv info (like the former). ebuild - caches most likely *should* return mtime information alongside, - although maybe dependent on readonly. purpose of this? Gives you a - way to hand off metadata searching to the cache db, rather then the - repo having to resort to pulling each cpv from the cache and doing - the check itself. This is what will make rdbms cache backends - finally stop sucking and seriously rocking, properly implemented at - least. :) clarification, you don't call this directly, repo.match - delegates off to this for metadata only restrictions - - -package -------- - -this is a wrapped, *constant* package. configured ebuild src, binpkg, -vdb pkg, etc. ebuild repositories don't exactly and return this- they -return unconfigured pkgs, which I'm not going to go into right now -(domains only see this protocol, visibility wrappers see different) - -.depends - usual meaning. ctarget depends -.rdepends - usual meaning. ctarget run time depends. seq, -.bdepends - see ml discussion. chost depends, what's executed in building this - (toolchain fex). seq. -.files - get a better name for this. doesn't encompas ``files/*``, but could be - slipped in that way for remote. encompasses restrict fetch (files - with urls), and chksum data. seq. -.description - usual meaning, although remember probably need a way to merge - metadata.xml lond desc into the more mundane description key. -.license - usual meaning, depset -.homepage - usual. Needed? -.setup() - Name sucks. gets ya the setup operation, which does building/whatever. -.data - Raw data. may not exist, don't screw with it unless you know what - it is, and know the instance's .data layout. -.build() - if this package is buildable, return a build operation, else return None - -restriction ------------ - -see layout.txt for more fleshed out examples of the idea. note, match -and pmatch have been reversed namewise. - -.match() - handed package instance, will return bool of whether or not this - restriction matches. -.cmatch() - try to force the changes; this is dependent on the package being - configurable. -.itermatch() - new one, debatable. short version, giving a sequence of package - instances, yields true/false for them. why might this be desirable? - if setup of matching is expensive, this gives you a way to amoritize - the cost. might have use for glsa set target. define a restriction - that limits to installed pkgs, yay/nay if update is avail... - -restrictionSet --------------- - -mentioning it merely cause it's a grouping (boolean and/or) of -individual restrictions an atom, which is in reality a category -restriction, package restriction, and/or version restriction is a -boolean and set of restrictions - -ContentsRestriction -------------------- - -whats this you say? a restriction for searching the vdb's contents db? -Perish the thought! ;) - -metadataRestriction -------------------- - -Mentioning this for the sake of pointing out a subclass of it, -DescriptionRestriction- this will be a class representing matching -against description data. See repo.match and cache.match above. The -short version is that it encapsulates the description search (a *very* -slow search right now) so that repo.match can hand off to the cache -(delegation), and the cache can do the search itself, however it sees -fit. - -So... for the default cache, flat_list (19500 ebuilds == 19500 files to -read for a full searchDesc), still is slow unless flat_list gets some -desc. cache added to it internally. If it's a sql based cache, the -sql_template should translate the query into the appropriate select -statement, which should make it *much* faster. - -Restating that, delegation is *absolutely* required. There have been -requests to add intermediate caches to the tree, or move data (whether -collapsing metadata.xml or moving data out of ebuilds) so that the -form it is stored is in quicker to search. These approaches are wrong. -Should be clear from above that a repository can, and likely will be -remote on some boxes. Such a shift of metadata does nothing but make -repository implementations that harder, and shift power away from what -knows best how to use it. Delegation is a massively more powerful -approach, allowing for more extensibility, flexibility and *speed*. - -Final restating- searchDesc is matching against cache data. The cache -(whether flat_list, anydbm, sqlite, or a remote sql based cache) is -the *authority* about the fastest way to do searches of its data. -Programmers get pist off when users try and tell them how something -internally should be implemented- it's fundamentally the same -scenario. The cache class the user chooses knows how to do its job -the best, provide methods of handing control down to it, and let it do -its job (delegation). Otherwise you've got a backseat driver -situation, which doesn't let those in the know, do the deciding (cache -knows, repo doesn't). - -Mind you not trying to be harsh here. If in reading through the full -doc you disagree, question it; if after speeding up current cache -implementation, note that any such change must be backwards -compatible, and not screw up the possibilities of -encapsulation/delegation this design aims for. - -logging -------- - -flesh this out (define this basically). short version, no more -writemsg type trickery, use a proper logging framework. - -ebuild-daemon.sh ----------------- - -Hardcoded paths *have* to go. /usr/lib/portage/bin == kill it. Upon -initial loadup of ebuild.sh, dump the default/base path down to the -daemon, *including* a setting for /usr/lib/portage/bin . Likely -declare -xr it, then load the actual ebuild*.sh libs. Backwards -compatibility for that is thus, ebuild.sh defines the var itself in -global scope if it's undefined. Semblence of backwards compatibility -(which is actually somewhat pointless since I'm about to blow it out -of the water). - -Ebuild-daemon.sh needs a function for dumping a _large_ amount of data -into bash, more then just a line or two. - -For the ultra paranoid, we load up eclasses, ebuilds, profile.bashrc's -into python side, pipe that to gpg for verification, then pipe that -data straight into bash. No race condition possible for files -used/transferred in this manner. - -A thought. The screw around speed up hack preload_eclasses added in -ebd's heyday of making it as fast as possible would be one route; -Basically, after verification of an elib/eclass, preload the eclass -into a func in the bash env. and declare -r the func after the fork. -This protects the func from being screwed with, and gives a way to (at -least per ebd instance) cache the verified bash code in memory. - -It could work surprisingly enough (the preload_eclass command already -works), and probably be fairly fast versus the alternative. So... the -race condition probably can be flat out killed off without massive -issues. Still leaves a race for perms on any ``files/*``, but neh. A) -That stuff shouldn't be executed, B) security is good, but we can't -cover every possibility (we can try, but dimishing returns) - -A lesser, but still tough version of this is to use the indirection -for actual sourcing to get paths instead. No EBUILD_PATH, query python -side for the path, which returns either '' (which ebd interprets as -"err, something is whacked, time to scream"), or the actual path. - -In terms of timing, gpg verification of ebuilds probably should occur -prior to even spawning ebd.sh. profile, eclass, and elib sourcing -should use this technique to do on the fly verification though. Object -interaction for that one is going to be *really* fun, as will be -mapping config settings to instantiation of objs. diff --git a/doc/dev-notes/fs-ops.rst b/doc/dev-notes/fs-ops.rst deleted file mode 100644 index 61d16e75..00000000 --- a/doc/dev-notes/fs-ops.rst +++ /dev/null @@ -1,30 +0,0 @@ -===================== -Filesystem Operations -===================== - -Here we define types of operations that pkgcore will support, as well as the -stages where these operations occur. - ---------------------------- -- File Deletion ( Removal ) ---------------------------- - - prerm - - unmerge files - - postrm - --------------------------------- -- File Addition ( Installation ) --------------------------------- - - preinst - - merge files - - postinst - ----------------------------------- -- File Replacement ( Overwriting ) ----------------------------------- - - preinst - - merge - - postinst - - prerm - - unmerge - - postrm diff --git a/doc/dev-notes/hacking.rst b/doc/dev-notes/hacking.rst deleted file mode 100644 index ccd413c4..00000000 --- a/doc/dev-notes/hacking.rst +++ /dev/null @@ -1,621 +0,0 @@ -======================== - Python Code Guidelines -======================== - -Note that not all of the existing code follows this style guide. -This doesn't mean existing code is correct. - -Stats are all from a sempron 1.6Ghz with python 2.4.2. - -Finally, code _should_ be documented, following epydoc/epytext guidelines - -Follow pep8, with following exemptions -====================================== - -- <80 char limit is only applicable where it doesn't make the logic - ugly. This is not an excuse to have a 200 char if statement (fix - your logic). Use common sense. -- Combining imports is ok. -- Use absolute imports -- _Simple_ try/except combined lines are acceptable, but not forced - (this is your call). example:: - - try: l.remove(blah) - except IndexError: pass - -- For comments, 2 spaces trailing is pointless- not needed. -- Classes should be named SomeClass, functions/methods should be named - some_func. -- Exceptions are classes. Don't raise strings. -- Avoid __var 'private' attributes unless you absolutely have a reason - to hide it, and the class won't be inherited (or that attribute - must _not_ be accessed) -- Using string module functions when you could use a string method is - evil. Don't do it. -- Use isinstance(str_instance, basestring) unless you _really_ need to - know if it's utf8/ascii - -Throw self with a NotImplementedError -===================================== - -The reason for this is simple: if you just throw a NotImplementedError, -you can't tell how the path was hit if derivative classes are involved; -thus throw NotImplementedError(self, string_name_of_attr) - -This gives far better tracebacks. - -Be aware of what the interpreter is actually doing -================================================== - -Don't use len(list_instance) when you just want to know if it's -nonempty/empty:: - - l=[1] - if l: blah - # instead of - if len(l): blah - -python looks for __nonzero__, then __len__. It's far faster -than if you try to be explicit there:: - - python -m timeit -s 'l=[]' 'if len(l) > 0: pass' - 1000000 loops, best of 3: 0.705 usec per loop - - python -m timeit -s 'l=[]' 'if len(l): pass' - 1000000 loops, best of 3: 0.689 usec per loop - - python -m timeit -s 'l=[]' 'if l: pass' - 1000000 loops, best of 3: 0.302 usec per loop - -Don't explicitly use has_key. Rely on the 'in' operator -======================================================= - -:: - - python -m 'timeit' -s 'd=dict(zip(range(1000), range(1000)))' 'd.has_key(1999999)' - 1000000 loops, best of 3: 0.512 usec per loop - - python -m 'timeit' -s 'd=dict(zip(range(1000), range(1000)))' '1999999 in d' - 1000000 loops, best of 3: 0.279 usec per loop - -Python interprets the 'in' command by using __contains__ on the -instance. The interpreter is faster at doing getattr's than actual -python code is: for example, the code above uses d.__contains__ , if you do -d.has_key or d.__contains__, it's the same speed. Using 'in' is faster -because it has the interpreter do the lookup. - -So be aware of how the interpreter will execute that code. Python -code specified attribute access is slower then the interpreter doing -it on its own. - -If you're in doubt, python -m timeit is your friend. ;-) - -Do not use [] or {} as default args in function/method definitions -================================================================== - -:: - - >>> def f(default=[]): - >>> default.append(1) - >>> return default - >>> print f() - [1] - >>> print f() - [1,1] - -When the function/class/method is defined, the default args are -instantiated _then_, not per call. The end result of this is that if it's a -mutable default arg, you should use None and test for it being None; this is -exempted if you _know_ the code doesn't mangle the default. - -Visible curried functions should have documentation -=================================================== - -When using the currying methods (pkgcore.util.currying) for function -mangling, preserve the documentation via pretty_docs. - -If this is exempted, pydoc output for objects isn't incredibly useful. - -Unit testing -============ - -All code _should_ have test case functionality. We use twisted.trial - you -should be running >=2.2 (<2.2 results in false positives in the spawn tests). -Regressions should be test cased, exempting idiot mistakes (e.g, typos). - -We are more than willing to look at code that lacks tests, but -actually merging the code to integration requires that it has tests. - -One area that is (at the moment) exempted from this is the ebuild interaction; -testing that interface is extremely hard, although it _does_ need to -be implemented. - -If tests are missing from code (due to tests not being written initially), -new tests are always desired. - - -If it's FS related code, it's _usually_ cheaper to try then to ask then try -=========================================================================== - -...but you should verify it ;) - - -existing file (but empty to avoid reading overhead):: - - echo > dar - - python -m 'timeit' -s 'import os' 'os.path.exists("dar") and open("dar").read()' - 10000 loops, best of 3: 36.4 usec per loop - - python -m 'timeit' -s 'import os' $'try:open("dar").read()\nexcept IOError: pass' - 10000 loops, best of 3: 22 usec per loop - -nonexistent file:: - - rm foo - - python -m 'timeit' -s 'import os' 'os.path.exists("foo") and open("foo").read()' - 10000 loops, best of 3: 29.8 usec per loop - - python -m 'timeit' -s 'import os' $'try:open("foo").read()\nexcept IOError: pass' - 10000 loops, best of 3: 27.7 usec per loop - -As you can see, there is a bit of a difference. :) - -Note that this was qualified with "If it's FS related code"; syscalls -are not cheap- if it's not triggering syscalls, the next section is -relevant. - -Catching Exceptions in python code (rather then cpython) isn't cheap -==================================================================== - -stats from python-2.4.2 - -When an exception is caught:: - - python -m 'timeit' -s 'd=dict(zip(range(1000), range(1000)))' $'try: d[1999]\nexcept KeyError: pass' - 100000 loops, best of 3: 8.7 usec per loop - - python -m 'timeit' -s 'd=dict(zip(range(1000), range(1000)))' $'1999 in d and d[1999]' - 1000000 loops, best of 3: 0.492 usec per loop - -When no exception is caught, overhead of try/except setup:: - - python -m 'timeit' -s 'd=dict(zip(range(1000), range(1000)))' $'try: d[0]\nexcept KeyError: pass' - 1000000 loops, best of 3: 0.532 usec per loop - - python -m 'timeit' -s 'd=dict(zip(range(1000), range(1000)))' $'d[0]' - 1000000 loops, best of 3: 0.407 usec per loop - - -This doesn't advocate writing code that doesn't protect itself- just be aware -of what the code is actually doing, and be aware that exceptions in -python code are costly due to the machinery involved. - -Another example is when to use or not to use dict's setdefault or get methods: - -key exists:: - - # Through exception handling - python -m timeit -s 'd=dict.fromkeys(range(100))' 'try: x=d[1]' 'except KeyError: x=42' - 1000000 loops, best of 3: 0.548 usec per loop - - # d.get - python -m timeit -s 'd=dict.fromkeys(range(100))' 'x=d.get(1, 42)' - 1000000 loops, best of 3: 1.01 usec per loop - - -key doesn't exist:: - - # Through exception handling - python -m timeit -s 'd=dict.fromkeys(range(100))' 'try: x=d[101]' 'except KeyError: x=42' - 100000 loops, best of 3: 8.8 usec per loop - - # d.get - python -m timeit -s 'd=dict.fromkeys(range(100))' 'x=d.get(101, 42)' - 1000000 loops, best of 3: 1.05 usec per loop - - -The short version of this is: if you know the key is there, dict.get() -is slower. If you don't, get is your friend. In other words, use it -instead of doing a containment test and then accessing the key. - -Of course this only considers the case where the default value is -simple. If it's something more costly "except" will do relatively -better since it's not constructing the default value if it's not -needed. So if in doubt and in a performance-critical piece of code: -benchmark parts of it with timeit instead of assuming "exceptions are -slow" or "[] is fast". - -cpython 'leaks' vars into local namespace for certain constructs -================================================================ - -:: - - def f(s): - while True: - try: - some_func_that_throws_exception() - except Exception, e: - # e exists in this namespace now. - pass - # some other code here... - -From the code above, e bled into the f namespace- that's referenced -memory that isn't used, and will linger until the while loop exits. - -Python _does_ bleed variables into the local namespace- be aware of -this, and explicitly delete references you don't need when dealing in -large objs, especially dealing with exceptions:: - - class c: - d = {} - for x in range(1000): - d[x] = x - -While the class above is contrived, the thing to note is that -c.x is now valid- the x from the for loop bleeds into the class -namespace and stays put. - -Don't leave uneeded vars lingering in class namespace. - -Variables that leak from for loops _normally_ aren't an issue, just be -aware it does occur- especially if the var is referencing a large object -(thus keeping it in memory). - -So... for loops leak, list comps leak, dependent on your except -clause they can also leak. - -Do not go overboard with this though. If your function will exit soon -do not bother cleaning up variables by hand. If the "leaking" things -are small do not bother either. - -The current code deletes exception instances explicitly much more -often than it should since this was believed to clean up the traceback -object. This does not work: the only thing "del e" frees up is the -exception instance and the arguments passed to its constructor. "del -e" also takes a small amount of time to run (clearing up all locals -when the function exits is faster). - -Unless you need to generate (and save) a range result, use xrange -================================================================= - -:: - python -m timeit 'for x in range(10000): pass' - 100 loops, best of 3: 2.01 msec per loop - - $ python -m timeit 'for x in xrange(10000): pass' - 1000 loops, best of 3: 1.69 msec per loop - -Removals from a list aren't cheap, especially left most -======================================================= - -If you _do_ need to do left most removals, the deque module is your friend. - -Rightmost removals aren't too cheap either, depending on what idiocy people -come up with to try and 'help' the interpreter:: - - python -m timeit $'l=range(1000);i=0;\nwhile i < len(l):\n\tif l[i]!="asdf":del l[i]\n\telse:i+=1' - 100 loops, best of 3: 4.12 msec per loop - - python -m timeit $'l=range(1000);\nfor i in xrange(len(l)-1,-1,-1):\n\tif l[i]!="asdf":del l[i]' - 100 loops, best of 3: 3 msec per loop - - python -m timeit 'l=range(1000);l=[x for x in l if x == "asdf"]' - 1000 loops, best of 3: 1 msec per loop - -Granted, that's worst case, but the worst case is usually where people -get bitten (note the best case still is faster for list comprehension). - -On a related note, don't pop() unless you have a reason to. - -If you're testing for None specifically, be aware of the 'is' operator -====================================================================== - -Is avoids the equality protocol, and does a straight ptr comparison:: - - python -m timeit '10000000 != None' - 1000000 loops, best of 3: 0.721 usec per loop - - $ python -m timeit '10000000 is not None' - 1000000 loops, best of 3: 0.343 usec per loop - - -Note that we're specificially forcing a large int; using 1 under 2.5 is the -same runtime, the reason for this is that it defaults to an identity check, -then a comparison; for small ints, python uses singletons, thus identity kicks in. - -Deprecated/crappy modules -========================= - -- Don't use types module. Use isinstance (this isn't a speed reason, - types sucks). -- Don't use strings module. There are exceptions, but use string - methods when available. -- Don't use stat module just to get a stat attribute- e.g.,:: - import stats - l=os.stat("asdf")[stat.ST_MODE] - - # can be done as (and a bit cleaner) - l=os.stat("asdf").st_mode - - -Know the exceptions that are thrown, and catch just those you're interested in -============================================================================== - -:: - - try: - blah - except Exception: - blah2 - -There is a major issue here. It catches SystemExit exceptions (triggered by -keyboard interupts); meaning this code, which was just bad exception handling -now swallows Ctrl+c (meaning it now screws with UI code). - -Catch what you're interested in *only*. - -tuples versus lists. -==================== - -The former is immutable, while the latter is mutable. - -Lists over-allocate (a cpython thing), meaning it takes up more memory -then is used (this is actually a good thing usually). - -If you're generating/storing a lot of sequences that shouldn't be -modified, use tuples. They're cheaper in memory, and people can reference -the tuple directly without being concerned about it being mutated elsewhere. - -However, using lists there would require each consumer to copy the list -to protect themselves from mutation. So... over-allocation + -allocating a new list for each consumer. - -Bad, mm'kay. - -Don't try to copy immutable instances (e.g. tuples/strings) -=========================================================== - -Example: copy.copy((1,2,3)) is dumb; nobody makes a mistake that obvious, -but in larger code people do (people even try using [:] to copy a -string; it returns the same string since it's immutable). - -You can't modify them, therefore there is no point in trying to make copies of them. - - -__del__ methods mess with garbage collection -============================================ - -__del__ methods have the annoying side affect of blocking garbage -collection when that instance is involved in a cycle- basically, the -interpreter doesn't know what __del__ is going to reference, so it's -unknowable (general case) how to break the cycle. - -So... if you're using __del__ methods, make sure the instance doesn't -wind up in a cycle (whether careful data structs, or weakref usage). - -A general point: python isn't slow, your algorithm is -===================================================== - -:: - - l = [] - for x in data_generator(): - if x not in l: - l.append(x) - -That code is _best_ case O(1) (e.g., yielding all 0's). The worst case is -O(N^2). - -:: - - l=set() - for x in data_generator(): - if x not in l: - l.add(x) - -Best/Worst are now constant (this isn't strictly true due to the potential -expansion of the set internally, but that's ignorable in this case). - -Furthermore, the first loop actually invokes the __eq__ protocol for x for -each element, which can potentially be *quite* slow if dealing in -complex objs. - -The second loop invokes __hash__ once on x instead (technically the set -implementation may invoke __eq__ if a collision occurs, but that's an implementation -detail). - -Technically, the second loop still is a bit innefficient:: - - l=set(data_generator()) - -is simpler and faster. - -An example data for people who don't see how _bad_ this can get:: - - python -m timeit $'l=[]\nfor x in xrange(1000):\n\tif x not in l:l.append(x)' - 10 loops, best of 3: 74.4 msec per loop - - python -m timeit $'l=set()\nfor x in xrange(1000):\n\tif x not in l:l.add(x)' - 1000 loops, best of 3: 1.24 msec per loop - - python -m timeit 'l=set(xrange(1000))' - 1000 loops, best of 3: 278 usec per loop - -The difference here is obvious. - -This does _not_ mean that sets are automatically better everywhere, -just be aware of what you're doing- for a single search of a range, -the setup overhead is far slower then a linear search. Nature of sets, while -the implementation may be able to guess the proper list size, it still has to -add each item in; if it *cannot* guess the size (ie, no size hint, generator, -iterator, etc), it has to just keep adding items in, expanding the set as -needed (which requires linear walks for each expansion). While this may seem -obvious, people sometimes do effectively the following:: - - python -m timeit -s 'l=range(50)' $'if 1001 in set(l): pass' - 100000 loops, best of 3: 12.2 usec per loop - - python -m timeit -s 'l=range(50)' $'if 1001 in l: pass' - 10000 loops, best of 3: 7.68 usec per loop - -What's up with __hash__ and dicts -================================= - -A bunch of things (too many things most likely) in the codebase define -__hash__. The rule for __hash__ is (quoted from -http://docs.python.org/ref/customization.html): - - Should return a 32-bit integer usable as a hash value for dictionary - operations. The only required property is that objects which compare - equal have the same hash value. - -Here's a quick rough explanation for people who do not know how a "dict" works -internally: - -- Things added to it are dumped in a "bucket" depending on their hash - value. -- To check if something is in the dict it first determines the bucket - to check (based on hash value), then does equality checks (__cmp__ - or __eq__ if there is one, otherwise object identity comparison) for - everything in the bucket (if there is anything). - -So what does this mean? - -- There's no reason at all to define your own __hash__ unless you also - define __eq__ or __cmp__. The behaviour of your object in dicts/sets - will not change, it will just be slower (since your own __hash__ is - almost certainly slower than the default one). -- If you define __eq__ or __cmp__ and want your object to be usable in - a dict you have to define __hash__. If you don't the default - __hash__ is used which means your objects act in dicts like only - object identity matters *until* you hit a hash collision and your - own __eq__ or __cmp__ kicks in. -- If you do define your own __hash__ it has to produce the same value - for objects that compare equal, or you get *really* weird behaviour - in dicts/sets ("thing in dict" returning False because the hash - values differ while "thing in dict.keys()" returns True because that - does not use the hash value, only equality checks). -- If the hash value changes after the object was put in a dict you get - weird behaviour too ("s=set([thing]); thing.change_hash();thing in s" - is False, but "thing in list(s)" is True). So if your objects are - mutable they can usually provide __eq__/__cmp__ but not __hash__. -- Not having many hash "collisions" (same hash value for objects that - compare nonequal) is good, but collisions are not illegal. Too many - of them just slow down dict/set operations (in a worst case scenario - of the same hash for every object dict/set operations become linear - searches through the single hash bucket everything ends up in). -- If you use the hash value directly keep in mind that collisions are - legal. Do not use comparisons of hash values as a substitute for - comparing objects (implementing __eq__ / __cmp__). Probably the only - legitimate use of hash() is to determine an object's hash value - based on things used for comparison. - - -__eq__ and __ne__ -================= - -From http://docs.python.org/ref/customization.html: - - There are no implied relationships among the comparison operators. - The truth of x==y does not imply that x!=y is false. Accordingly, - when defining __eq__(), one should also define __ne__() so that the - operators will behave as expected. - -They really mean that. If you define __eq__ but not __ne__ doing "!=" -on instances compares them by identity. This is surprisingly easy to -miss, especially since the natural way to write unit tests for classes -with custom comparisons goes like this:: - - self.assertEqual(YourClass(1), YourClass(1)) - # Repeat for more possible values. Uses == and therefore __eq__, - # behaves as expected. - self.assertNotEqual(YourClass(1), YourClass(2)) - # Repeat for more possible values. Uses != and therefore object - # identity, so they all pass (all different instances)! - -So you end up only testing __eq__ on equal values (it can say -"identical" for different values without you noticing). - -Adding a __ne__ that just does "return not self == other" fixes this. - - -__eq__/__hash__ and subclassing -=============================== - -If your class has a custom __eq__ and it might be subclassed you have -to be very careful about how you "compare" to instances of a subclass. -Usually you will want to be "different" from those unconditionally:: - - def __eq__(self, other): - if self.__class is not YourClass or other.__class__ is not YourClass: - return False - # Your actual code goes here - -This might seem like overkill, but it is necessary to avoid problems if -you are subclassed and the subclass does not have a new __eq__. If you -just do an "isinstance(other, self.__class__)" check you will compare -equal to instances of a subclass, which is usually not what you want. -If you just check for "self.__class__ is other.__class__" then -subclasses that add a new attribute without overriding __eq__ will -compare equal when they should not (because the new attribute -differs). - -If you subclass something that has an __eq__ you should most likely -override it (you might get away with not doing so if the class does -not do the type check demonstrated above). If you add a new attribute -don't forget to override __hash__ too (that is not critical, but you -will have unnecessary hash collisions if you forget it). - -This is especially important for pkgcore because of -pkgcore.util.caching. If an instance of a class with a broken __eq__ -is used as argument for the __init__ of a class that uses -caching.WeakInstMeta it will cause a cached instance to be used when -it should not. Notice the class with the broken __eq__ does not have -to be cached itself to trigger this! Getting this wrong can cause fun -behaviour like atoms showing up in the list of fetchables because the -restrictions they're in compare equal independent of their "payload". - - -Exception subclassing -===================== - -It is pretty common for an Exception subclass to want to customize the -return value of str() on an instance. The easiest way to do that is:: - - class MyException(Exception): - - """Describe when it is raised here.""" - - def __init__(self, stuff): - Exception.__init__(self, 'MyException because of %s' % (stuff,)) - -This is usually easier than defining a custom __str__ (since you do -not have to store the value of "stuff" as an attribute) and you should -be calling the base class __init__ anyway. - -(This does not mean you should never store things like "stuff" as -attrs: it can be very useful for code catching the exception to have -access to it. Use common sense.) - - -Memory debugging -================ - -Either heappy, or dowser are the two currently recommended tools. - -To use dowser, insert the following into the code wherever you'd like -to check the heap- this is blocking also:: - - import cherrpy - import dowser - cherrypy.config.update({'engine.autoreload_on': False}) - try: - cherrypy.quickstart(dowser.Root()) - except AttributeError: - cherrypy.root = dowser.Root() - cherrypy.server.start() - - -For using heappy, see the heappy documentation in pkgcore/dev-notes. diff --git a/doc/dev-notes/harring-notes.rst b/doc/dev-notes/harring-notes.rst deleted file mode 100644 index 2138816e..00000000 --- a/doc/dev-notes/harring-notes.rst +++ /dev/null @@ -1,32 +0,0 @@ -resolver -======== - -Current design doesn't coalesce- expects that each atom as it's passed in -specifies the dbs, which is how it does it's update/empty-tree trickery. - -This isn't optimal. Need to flag specific atoms/matches as "upgrade if -possible" or "empty tree if possible", etc; via this, we get coalescing -behaviour. Specifically, if the targets are git[subversion] and subversion, -we want both upgraded. So when resolving git[subversion] and encountering -dev-util/subversion, we should aim for upgrading it per the commandline request. - -Additional question- should we apply this coalescing awareness to intermediate atoms -along the way resolution wise? specifically, the cnf/dnf solutions, grabbing those -and stating "yeah, collapse to these if possible since they're likely required" ? - - -resolver redesign -================= - -Hate to say it, but should go back to a specific 'resolve' method w/ the -resolver plan object holding targets- reason being, we may have to backtrack -the whole way. - - -config/use issues -================= - -need to find a way to clone a stack, getting a standalone config stack if -possible for the resolver- specifically so it can do resets as needed, track -what is involved (use dep forcing) w/out influencing preexisting access to -that tree, nor being affected by said usage. diff --git a/doc/dev-notes/heapy.rst b/doc/dev-notes/heapy.rst deleted file mode 100644 index 06073beb..00000000 --- a/doc/dev-notes/heapy.rst +++ /dev/null @@ -1,497 +0,0 @@ -======================================================= - How to use guppy/heapy for tracking down memory usage -======================================================= - -This is a work in progress. It will grow a bit and it may not be -entirely accurate everywhere. - -Tutorial of sorts -================= - -All this was done on a checkout of [email protected], you should be able -to check that out and follow along using something like:: - - bzr revert -rrevid:[email protected] - -in a pkgcore branch. - -Heapy is powerful but has a learning curve. Problems are the -documentation (http://guppy-pe.sourceforge.net/heapy_Use.html among -others) is a bit unusual and there are various dynamic importing and -other tricks in use that mean things like dir() are less helpful than -they are on more "normal" python objects. This document's main purpose -is to show you how to ask heapy various kinds of questions. It may or -may not show a few cases where pkgcore uses more memory than it should -too. - -First, get an x86. Heapy currently does not like 64 bit archs much. - -Emerge it:: - - emerge guppy - -Fire up an interactive python prompt, set stuff up:: - - >>> from guppy import hpy - >>> from pkgcore.config import load_config - >>> c = load_config() - >>> hp = hpy() - -Just to show how annoying heapy's internal tricks are:: - - >>> dir(hp) - ['__doc__', '__getattr__', '__init__', '__module__', '__setattr__', '_hiding_tag_', '_import', '_name', '_owner', '_share'] - >>> help(hp) - Help on class _GLUECLAMP_ in module guppy.etc.Glue: - - _GLUECLAMP_ = <guppy.heapy.Use interface at 0x-484b8554> - -This object is your "starting point", but as you can see the -underlying machinery is not giving away any useful usage instructions. - -Do everything that allocates some memory but is not the problem you -are tracking down now. Then do:: - - >>> hp.setrelheap() - -Everything allocated before this call will not be in the data sets you -get later. - -Now do your memory-intensive thing:: - - >>> l = list(x for x in c.repo["gentoo"] if x.data) - -Keep an eye on system memory consumption. You want to use up a lot but -not all of your system ram for nicer statistics. The python process -was eating about 109M res in top when the above stuff finished, which -is pretty good (for my 512mb ram box). - -:: - - >>> h = hp.heap() - -The fun one. This object is basically a snapshot of what's reachable -in ram (minus the stuff excluded through setrelheap earlier) which you -can do various fun tricks with. Its str() is a summary:: - - >>> h - Partition of a set of 1449133 objects. Total size = 102766644 bytes. - Index Count % Size % Cumulative % Kind (class / dict of class) - 0 985931 68 46300932 45 46300932 45 str - 1 24681 2 22311624 22 68612556 67 dict of pkgcore.ebuild.ebuild_src.package - 2 49391 3 21311864 21 89924420 88 dict (no owner) - 3 115974 8 3776948 4 93701368 91 tuple - 4 152181 11 3043616 3 96744984 94 long - 5 36009 2 1584396 2 98329380 96 weakref.KeyedRef - 6 11328 1 1540608 1 99869988 97 dict of pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace - 7 24702 2 889272 1 100759260 98 types.MethodType - 8 11424 1 851840 1 101611100 99 list - 9 24681 2 691068 1 102302168 100 pkgcore.ebuild.ebuild_src.package - <54 more rows. Type e.g. '_.more' to view.> - -(You might want to keep an eye on ram usage: heapy made the process -grow another dozen mb here. It gets painfully slow if it starts -swapping, so if that happens reduce your data set). - -Notice the "Total size" in the top right: about 100M. That's what we -need to compare later numbers with. - -So here we can see that (surprise!) we have a ton of strings in -memory. We also have various kinds of dicts. Dicts are treated a bit -specially: the "dict of pkgcore.ebuild.ebuild_src.package" simply -means "all the dicts that are __dict__ attributes of instances of that -class". "dict (no owner)" are all the dicts that are not used as -__dict__ attribute. - -You probably guessed what you can use "index" for:: - - >>> h[0] - Partition of a set of 985931 objects. Total size = 46300932 bytes. - Index Count % Size % Cumulative % Kind (class / dict of class) - 0 985931 100 46300932 100 46300932 100 str - -Ok, that looks pretty useless, but it really is not. The "sets" heapy -gives you (like "h" and "h[0]") are a bunch of objects, grouped -together by an "equivalence relation". The default one (with the crazy -name "Clodo" for "Class or dict owner") groups together all objects of -the same class and dicts with the same owner. We can also partition -the sets by a different equivalence relation. Let's do a silly example -first:: - - >>> h.bytype - Partition of a set of 1449133 objects. Total size = 102766644 bytes. - Index Count % Size % Cumulative % Type - 0 985931 68 46300932 45 46300932 45 str - 1 85556 6 45226592 44 91527524 89 dict - 2 115974 8 3776948 4 95304472 93 tuple - 3 152181 11 3043616 3 98348088 96 long - 4 36009 2 1584396 2 99932484 97 weakref.KeyedRef - 5 24702 2 889272 1 100821756 98 types.MethodType - 6 11424 1 851840 1 101673596 99 list - 7 24681 2 691068 1 102364664 100 pkgcore.ebuild.ebuild_src.package - 8 11328 1 317184 0 102681848 100 pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace - 9 408 0 26112 0 102707960 100 types.CodeType - <32 more rows. Type e.g. '_.more' to view.> - -As you can see this is the same thing as the default view, but with -all the dicts lumped together. A more useful one is:: - - >>> h.byrcs - Partition of a set of 1449133 objects. Total size = 102766644 bytes. - Index Count % Size % Cumulative % Referrers by Kind (class / dict of class) - 0 870779 60 43608088 42 43608088 42 dict (no owner) - 1 24681 2 22311624 22 65919712 64 pkgcore.ebuild.ebuild_src.package - 2 221936 15 20575932 20 86495644 84 dict of pkgcore.ebuild.ebuild_src.package - 3 242236 17 8588560 8 95084204 93 tuple - 4 6 0 1966736 2 97050940 94 dict of weakref.WeakValueDictionary - 5 36009 2 1773024 2 98823964 96 dict (no owner), dict of - pkgcore.ebuild.ebuild_src.package, weakref.KeyedRef - 6 11328 1 1540608 1 100364572 98 pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace - 7 26483 2 800432 1 101165004 98 list - 8 11328 1 724992 1 101889996 99 dict of pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace - 9 3 0 393444 0 102283440 100 dict of pkgcore.repository.prototype.IterValLazyDict - <132 more rows. Type e.g. '_.more' to view.> - -What this does is: - -- for every object, find all its referrers -- Classify those referrers using the "Clodo" relation you saw earlier -- Create a set of those classifiers of referrers. That means a set of - things like "tuple, dict of someclass", *not* of actual referring objects. -- Group together all the objects with the same set of classifiers of referrers. - -So now we know that we have a lot of objects referenced *only* by one -or more dicts (still not very useful) and also a lot of them -referenced by one "normal" dict, referenced by the dict of (meaning -"an attribute of") ebuild_src.package, and referenced by a WeakRef. -Hmm, I wonder what those are. But let's store this view of the data -first, since it took a while to generate ("_" is a feature of the -python interpreter, it's always the last result):: - - >>> byrcs = _ - >>> byrcs[5] - Partition of a set of 36009 objects. Total size = 1773024 bytes. - Index Count % Size % Cumulative % Referrers by Kind (class / dict of class) - 0 36009 100 1773024 100 1773024 100 dict (no owner), dict of - pkgcore.ebuild.ebuild_src.package, weakref.KeyedRef - -Erm, yes, we knew that already. If you look in the top right of the -table you can see it is still grouping the items by the kind of their -referrer, which is not very useful here. To get more information we -can change what they are grouped by:: - - >>> byrcs[5].byclodo - Partition of a set of 36009 objects. Total size = 1773024 bytes. - Index Count % Size % Cumulative % Kind (class / dict of class) - 0 36009 100 1773024 100 1773024 100 str - >>> byrcs[5].bysize - Partition of a set of 36009 objects. Total size = 1773024 bytes. - Index Count % Size % Cumulative % Individual Size - 0 10190 28 489120 28 489120 28 48 - 1 7584 21 394368 22 883488 50 52 - 2 7335 20 322740 18 1206228 68 44 - 3 3947 11 221032 12 1427260 80 56 - 4 3364 9 134560 8 1561820 88 40 - 5 1903 5 114180 6 1676000 95 60 - 6 877 2 56128 3 1732128 98 64 - 7 285 1 19380 1 1751508 99 68 - 8 451 1 16236 1 1767744 100 36 - 9 57 0 4104 0 1771848 100 72 - - -This took the set of objects with that odd set of referrers and -redisplayed them grouped by "clodo". So now we know they're all -strings. Most of them are pretty small too. To get some idea of what -we're dealing with we can pull some random examples out:: - - >>> byrcs[5].byid - Set of 36009 <str> objects. Total size = 1773024 bytes. - Index Size % Cumulative % Representation (limited) - 0 80 0.0 80 0.0 'media-plugin...re20051219-r1' - 1 76 0.0 156 0.0 'app-emulatio...4.20041102-r1' - 2 76 0.0 232 0.0 'dev-php5/ezc...hemaTiein-1.0' - 3 76 0.0 308 0.0 'games-misc/f...wski-20030120' - 4 76 0.0 384 0.0 'mail-client/...pt-viewer-0.8' - 5 76 0.0 460 0.0 'media-fonts/...-100dpi-1.0.0' - 6 76 0.0 536 0.0 'media-plugin...gdemux-0.10.4' - 7 76 0.0 612 0.0 'media-plugin...3_pre20051219' - 8 76 0.0 688 0.0 'media-plugin...3_pre20051219' - 9 76 0.0 764 0.0 'media-plugin...3_pre20060502' - >>> byrcs[5].byid[0].theone - 'media-plugins/vdr-streamdev-server-0.3.3_pre20051219-r1' - -A pattern emerges! (sets with one item have a "theone" attribute with -the actual item, all sets have a "nodes" attribute that returns an -iterator yielding the items). - -We could have used another heapy trick to get a better idea of what -kind of string this was:: - - >>> byrcs[5].byvia - Partition of a set of 36009 objects. Total size = 1773024 bytes. - Index Count % Size % Cumulative % Referred Via: - 0 1 0 80 0 80 0 "['cpvstr']", '.key', '.keys()[23147]' - 1 1 0 76 0 156 0 "['cpvstr']", '.key', '.keys()[12285]' - 2 1 0 76 0 232 0 "['cpvstr']", '.key', '.keys()[12286]' - 3 1 0 76 0 308 0 "['cpvstr']", '.key', '.keys()[16327]' - 4 1 0 76 0 384 0 "['cpvstr']", '.key', '.keys()[17754]' - 5 1 0 76 0 460 0 "['cpvstr']", '.key', '.keys()[19079]' - 6 1 0 76 0 536 0 "['cpvstr']", '.key', '.keys()[21704]' - 7 1 0 76 0 612 0 "['cpvstr']", '.key', '.keys()[23473]' - 8 1 0 76 0 688 0 "['cpvstr']", '.key', '.keys()[24239]' - 9 1 0 76 0 764 0 "['cpvstr']", '.key', '.keys()[3070]' - <35999 more rows. Type e.g. '_.more' to view.> - -Ouch, 36009 total rows for 36009 objects. What this did is similar to -what "byrcs" did: for every object in the set it determined how they -can be reached through their referrers, then groups objects that can -be reached in the same ways together. Unfortunately it is grouping -everything reachable as a dictionary key differently, so this is not -very useful. - -XXX WTF XXX - -It is not likely this accomplishes anything, but let's assume we want -to know if there are any objects in this set *not* reachable as the -"key" attribute. Heapy can tell us (although this is *very* slow... -there might be a better way but I do not know it yet):: - - >>> nonkeys = byrcs[5] & hp.Via('.key').alt('<') - >>> nonkeys.byrcs - hp.Nothing - -(remember "hp" was our main entrance into heapy, the object that gave -us the set of all objects we're interested in earlier). - -What does this do? "hp.Via('.key')" creates a "symbolic set" of "all -objects reachable *only* as the 'key' attribute of something" (it's a -"symbolic set" because there are no actual objects in it). The "alt" -method gives us a new symbolic set of everything reachable via "less -than" this way. We then intersect this with our set and discover there -is nothing left. - -A similar construct that does not do what we want is:: - - >>> nonkeys = byrcs[5] & ~hp.Via('.key') - -The "~" operator inverts the symbolic set, giving a set matching -everything not reachable *exactly* as a "key" attribute. The key word -here is "exactly": since everything in our set was also reachable in -two other ways this intersection matches everything. - -Ok, let's get back to the stuff actually eating memory:: - - >>> h[0].byrcs - Index Count % Size % Cumulative % Referrers by Kind (class / dict of class) - 0 670791 68 31716096 68 31716096 68 dict (no owner) - 1 139232 14 6525856 14 38241952 83 tuple - 2 136558 14 6042408 13 44284360 96 dict of pkgcore.ebuild.ebuild_src.package - 3 36009 4 1773024 4 46057384 99 dict (no owner), dict of - pkgcore.ebuild.ebuild_src.package, weakref.KeyedRef - 4 1762 0 107772 0 46165156 100 list - 5 824 0 69476 0 46234632 100 types.CodeType - 6 140 0 31312 0 46265944 100 function, tuple - 7 194 0 11504 0 46277448 100 dict of module - 8 30 0 6284 0 46283732 100 dict of type - 9 55 0 1972 0 46285704 100 dict of module, tuple - -Remember h[0] gave us all str objects, so this is all string objects -grouped by the kind(s) of their referrers. Also notice index 3 here is -the same set of stuff we saw earlier:: - - >>> h[0].byrcs[3] ^ byrcs[5] - hp.Nothing - -Most operators do what you would expect, & intersects for example. - -"We have a lot of strings in dicts" is not that useful either, let's -see if we can narrow that down a little:: - - >>> h[0].byrcs[0].referrers.byrcs - Partition of a set of 44124 objects. Total size = 18636768 bytes. - Index Count % Size % Cumulative % Referrers by Kind (class / dict of class) - 0 24681 56 12834120 69 12834120 69 dict of pkgcore.ebuild.ebuild_src.package - 1 19426 44 5371024 29 18205144 98 dict (no owner) - 2 1 0 393352 2 18598496 100 dict of pkgcore.repository.prototype.IterValLazyDict - 3 1 0 6280 0 18604776 100 __builtin__.set - 4 1 0 6280 0 18611056 100 dict of module, guppy.heapy.heapyc.RootStateType - 5 1 0 6280 0 18617336 100 dict of pkgcore.ebuild.eclass_cache.cache - 6 1 0 6280 0 18623616 100 dict of - pkgcore.repository.prototype.PackageIterValLazyDict - 7 4 0 5536 0 18629152 100 type - 8 4 0 3616 0 18632768 100 dict of type - 9 1 0 1672 0 18634440 100 dict of module, dict of os._Environ - -(Broken down: h[0].byrcs[0] is the set of all str objects referenced -only by dicts, h[0].byrcs[0].referrers is the set of those dicts, and -the final .byrcs displays those dicts grouped by *their* referrers) - -Keep an eye on the size column. We have over 12M worth of just dicts -(not counting the stuff in them) referenced only as attribute of -ebuild_src.package. If we include the stuff kept alive by those dicts -we're talking about a big chunk of the 100MB total here:: - - >>> t = _ - >>> t[0].domisize - 61269552 - -60M out of our 100M would be deallocated if we killed those dicts. So -let's ask heapy what dicts that are:: - - >>> t[0].byvia - Partition of a set of 24681 objects. Total size = 12834120 bytes. - Index Count % Size % Cumulative % Referred Via: - 0 24681 100 12834120 100 12834120 100 "['data']" - -(it is easy to get confused by the "byrcs" view of our "t". t[0] is -*not* a bunch of "dict of ebuild_src.package". It is a bunch of dicts -with strings in them, namely those that are *referred to* by the dict -of ebuild_src.package, and not by anything else. So the byvia output -means those dicts with strings in them are all "data" attributes of -ebuild_src.package instances). - -(sidenote: earlier we saw byvia say ".key", now it says "['data']". -It's different because the previous type used __slots__ (so there was -no "dict of" involved) and this type does not (so there is a "dict of" -and our dicts are the "data" key in it). - -So what is in the dicts:: - - >>> t[0].referents - Partition of a set of 605577 objects. Total size = 34289392 bytes. - Index Count % Size % Cumulative % Kind (class / dict of class) - 0 556215 92 27710068 81 27710068 81 str - 1 24681 4 6085704 18 33795772 99 dict (no owner) - 2 24681 4 493620 1 34289392 100 long - >>> _.byvia - Partition of a set of 605577 objects. Total size = 34289392 bytes. - Index Count % Size % Cumulative % Referred Via: - 0 24681 4 6085704 18 6085704 18 "['_eclasses_']" - 1 21954 4 3742976 11 9828680 29 "['DEPEND']" - 2 22511 4 3300052 10 13128732 38 "['RDEPEND']" - 3 24202 4 2631304 8 15760036 46 "['SRC_URI']" - 4 24681 4 1831668 5 17591704 51 "['DESCRIPTION']" - 5 24674 4 1476680 4 19068384 56 "['HOMEPAGE']" - 6 24681 4 1297680 4 20366064 59 "['KEYWORDS']" - 7 24681 4 888516 3 21254580 62 '.keys()[3]' - 8 24681 4 888516 3 22143096 65 '.keys()[9]' - 9 24681 4 810108 2 22953204 67 "['LICENSE']" - <32 more rows. Type e.g. '_.more' to view.> - -Strings, nested dicts and longs, and most size eaten up by the -"_eclasses_" values. There is also a significant amount eaten up by -keys values, which is a bit odd, so let's investigate:: - - >>> refs = t[0].referents - >>> i=iter(refs.byvia[7].nodes) - >>> i.next() - 'DESCRIPTION' - >>> i.next() - 'DESCRIPTION' - >>> i.next() - 'DESCRIPTION' - >>> i.next() - 'DESCRIPTION' - >>> i.next() - 'DESCRIPTION' - -Eep! - -:: - - >>> refs.byvia[7].bysize - Partition of a set of 24681 objects. Total size = 888516 bytes. - Index Count % Size % Cumulative % Individual Size - 0 24681 100 888516 100 888516 100 36 - -It looks like we have 24681 identical strings here, using up about 1M -of memory. The other odd entry is the '_eclasses_' string apparently. - -Extra stuff for c extension developers -====================================== - -To provide accurate statistics if your code uses extension types you -must provide heapy with a way to get the following data for your -custom types: - -- How large is a certain instance? -- What objects does an instance contain? -- How does the instance refer to a contained object? - -You provide these through a NyHeapDef struct, defined in heapdef.h in -the guppy source. This header is not installed, so you should just -copy it into your source tree. It is a good idea to read this header -file side by side with the following descriptions, since it contains -details omitted here. The stdtypes.c file contains implementations for -the basic python types which you can read for inspiration. - -The NyHeapDef struct provides heapy with three function pointers: - -SizeGetter ----------- - -To answer "how large is an instance" you provide a -NyHeapDef_SizeGetter function that is called with a PyObject* and -returns an int: the number of bytes the object occupies. If you do not -provide this function heapy uses a default that looks at the -tp_basicsize and tp_itemsize fields of the type. This means that if -you do not allocate any extra memory for non-python objects (e.g. for -c strings) you do not need to provide this function. - -Traverser ---------- - -To answer "What objects does an instance contain" you provide a -traversal function (NyHeapDef_Traverser). This is called with a -pointer to a "visit procedure", an instance of your extension type and -some other stuff. You should then call the visit procedure for every -python object contained in your object. - -This might sound familiar: to support the python garbage collector you -provide a very similar function (tp_traverse). Actually heapy will use -tp_traverse if you do not provide a heapy-specific traverse function. -Doing this makes sense if you do not support the garbage collector for -some reason, or if you contain objects that are irrelevant to the -garbage collector. - -An example would be a type that contains a single python string -object (that no other code can get a reference to). If this object -does not have references to other python objects it cannot be involved -in cycles so supporting gc would be useless. However you do still want -heapy to know about the memory occupied by the contained string. You -could do that by adding that size in your NyHeapDef_SizeGetter -function but it is probably easier to tell heapy about the string -through the traversal function (so you do not have to calculate the -memory occupied by the string). - -If the above type would also have a reference to some arbitrary -(non-private) python object it should support gc, but it does not need -to tell gc about the contained string. So you would have two traversal -functions, one for heapy that visits the string and one for gc that -does not. - -RelationGetter --------------- - -The last function heapy wants tells it in what way your instance -refers to some contained object. It is used to provide the "byvia" -view. This calls a visit function once for each way your instance -refers to a target object, telling it what kind of reference it is. - -Providing the heapdef struct to heapy -------------------------------------- - -Once you have the needed function pointers in a struct you need to -pass this to heapy somehow. This is done through a standard cpython -mechanism called "cobjects". From python these look like rather stupid -objects you cannot do anything with, but from c you can pull out a -void* that was put in when the object was constructed. You can wrap an -arbitrary pointer in a CObject, make it available as attribute of your -module, then import it from some other module, pull the void* back -out and cast it to the original type. - -heapy looks for a _NyHeapDefs_ attribute on all loaded modules. If -this attribute exists and is a CObject the pointer in it is used as a -pointer to an array of NyHeapDef struct (terminated with a struct with -only nulls). Example code doing this is in sets.c in the guppy source. diff --git a/doc/dev-notes/plugins.rst b/doc/dev-notes/plugins.rst deleted file mode 100644 index 8a30d7e8..00000000 --- a/doc/dev-notes/plugins.rst +++ /dev/null @@ -1,118 +0,0 @@ -================ - Plugins system -================ - -Goals -===== - -The plugin system (``pkgcore.plugin``) is used to pick up extra code -(potentially distributed separately from pkgcore itself) at a place -where using the config system is not a good idea for some reason. This -means that for a lot of things that most people would call "plugins" -you should not actually use ``pkgcore.plugin``, you should use the -config system. Things like extra repository types should simply be -used as "class" value in the configuration. The plugin system is -currently mainly used in places where handing in a ``ConfigManager`` -is too inconvenient. - -Using plugins -============= - -Plugins are looked up based on a string "key". You can always look up -all available plugins matching this key with -``pkgcore.plugin.get_plugins(key)``. For some kinds of plugin (the -ones defining a "priority" attribute) you can also get the "best" -plugin with ``pkgcore.plugin.get_plugin(key)``. This does not make -sense for all kinds of plugin, so not all of them define this. - -The plugin system does not care about what kind of object plugins are, -this depends entirely on the key. - -Adding plugins -============== - -Basics, caching ---------------- - -Plugins for pkgcore are loaded from modules inside the -``pkgcore.plugins`` package. This package has some magic to make -plugins in any subdirectory ``pkgcore/plugins`` under a directory on -``sys.path`` work. So if pkgcore itself is installed in site-packages -you can still add plugins to ``/home/you/pythonlib/pkgcore/plugins`` -if ``/home/you/pythonlib`` is in ``PYTHONPATH``. You should not put an -``__init__.py`` in this extra plugin directory. - -Plugin modules should contain a ``pkgcore_plugins`` directory that -maps the "key" strings to a sequence of plugins. This dictionary has -to be constant, since pkgcore keeps track of what plugin module -provides plugins for what keys in a cache file to avoid unnecessary -imports. So this is invalid:: - - try: - import spork_package - except ImportError: - pkgcore_plugins = {} - else: - pkgcore_plugins = {'myplug': [spork_package.ThePlugin]} - -since if the plugin cache is generated while the package is not -available pkgcore will cache the module as not providing any -``myplug`` plugins, and the cache will not be updated if the package -becomes available (only changes to the mtime of actual plugin modules -invalidate the cache). Instead you should do something like this:: - - try: - from spork_package import ThePlugin - except ImportError: - class ThePlugin: - disabled = True - - pkgcore_plugins = {'myplug': [ThePlugin]} - -If a plugin has a "disabled" attribute the plugin system will never -return it from ``get_plugin`` or ``get_plugins``. - -Priority --------- - -If you want your plugin to support ``get_plugin`` it should have a -``priority`` attribute: an integer indicating how "preferred" this -plugin is. The plugin with the highest priority (that is not disabled) -is returned from ``get_plugin``. - -Some types of plugins need more information to determine a priority -value. Those should not have a priority attribute. They should use -``get_plugins`` instead and have a method that gets passed the extra -data and returns the priority. - -Import behaviour ----------------- - -Assuming the cache is working correctly (it was generated after -installing a plugin as root) pkgcore will import all plugin modules -containing plugins for a requested key in priority order until it hits -one that is not disabled. The "disabled" value is not cached (a plugin -that is unconditionally disabled makes no sense), but the priority -value is. You can fake a dynamic priority by having two instances of -your plugin registered and only one of them enabled at the same -time. - -This means it makes sense to have only one kind of plugin per plugin -module (unless the required imports overlap): this avoids pulling in -imports for other kinds of plugin when one kind of plugin is -requested. - -The disabled value is not cached by the plugin system after the plugin -module is imported. This means it should be a simple attribute (either -completely constant or set at import time) or property that does its -own caching. - -Adding a plugin package -======================= - -Both ``get_plugin`` and ``get_plugins`` take a plugin package as -second argument. This means you can use the plugin system for external -pkgcore-related tools without cluttering up the main pkgcore plugin -directory. If you do this you will probably want to copy the -``__path__`` trick from ``pkgcore/plugin/__init__.py`` to support -plugins elsewhere on ``sys.path``. diff --git a/doc/dev-notes/portage-differences.rst b/doc/dev-notes/portage-differences.rst deleted file mode 100644 index 14a1dad2..00000000 --- a/doc/dev-notes/portage-differences.rst +++ /dev/null @@ -1,88 +0,0 @@ -=========================== -Pkgcore/Portage differences -=========================== - -Disclaimer ----------- - -Pkgcore moves fairly fast in terms of development- we will strive to keep this doc -up to date, but it may lag behind the actual code. - --------------------------- -Ebuild environment changes --------------------------- - -All changes are either glep33 related, or a tightening of the restrictions on -the env to block common snafus that localize the ebuilds environment to that -machine. - -- portageq based functions are disabled in the global scope. Reasoning for this - is that of QA- has_version/best_version **must not** affect the generated - metadata. As such, portageq calls in the global scope are disabled. - -- inherit is disabled in all phases but depend and setup. Folks no longer do - it, but inherit from within one of the build/install phases is now actively - blocked. - -- The ebuild env is now *effectively* akin to suspending the process, and restarting - it. Essentially, transitioning between ebuild phases, the ebuild environment - is snapshotted, cleaned of irrevelent data (bash forced vars for example, or - vars that pkgcore sets for the local system on each shift into a phase), and - saved. Portage does this partially (re-execs ebuilds/eclasses, thus stomping - the env on each phase change), pkgcore does it fully. As such, pkgcore is - capable of glep33, while portage is not (env fixes are the basis of glep33). - -- ebuild.sh has been daemonized (ebd). The upshot of this is that regen is - roughly 2x faster (careful reuse of ebd instances rather then forcing bash to - spawn all over). Additional upshot of this is that their are bidirectional - communication pipes between ebd and the python parent- env inspection, - logging, passing requests up to the python side (has_version/best_version for - example) are now handled within the existing processes. Design of it from - the python side is that of an extensible event handler, as such it's - extremely easy to add new commands in, or special case certain things. - -- The ebd now protects itself from basic fiddling. Ebuild generated state - **must** work as long as the EAPI is the same, regardless of the generating - portage version, and the portage version that later uses the saved state - (simple example, generated with portage-2.51, if portage 3 is EAPI compliant - with that env, it must not allow it's internal bash changes to break the env). - As such, certain funcs are not modifiable by the ebuild- namely, internal - portage/pkgcore functionality, hasq/useq for example. Those functions that - are read-only also are not saved in the ebuild env (they should be supplied - by the portage/pkgcore instance reloading the env). - ------------------------ -Repository Enhancements ------------------------ - -Pkgcore internally uses a sane/uniform repository abstraction- the benefits -of this are: - -- repository class (which implements the accessing of the on disk/remote tree) - is pluggable. Remote source or installed repos are doable, as is having your - repository tree ran strictly from downloaded metadata (for example), or - running from a tree stored in a tarball/zip file (mildly crazy, but it's - doable). - -- separated repository instances. We've not thrown out overlays (as paludis - did), but pkgcore doesn't force every new repository to be an overlay of the - default 'master' repo as portage does. - -- optimized repository classes- for the usual vdb and ebuild repository - (those being on disk backwards compatible with portage 2.x), the number of - syscalls required was drastically reduced, with ondisk info (what packages - available per category for example) cached. It is a space vs time trade - off, but the space trade off is neglible (couple of dict with worst case, - 66k mappings)- as is, portage's listdir caching consumed a bit more memory - and was slower, so all in all a gain (mainly it's faster with using - slightly less memory then portages caching). - -- unique package instances yielded from repository. Pkgcore uses a package - abstraction internally for accessing metadata/version/category, etc- all - instances returned from repositories are unique immutable instances. - Gain of it is that if you've got dev-util/diffball-0.7.1 sitting in memory - already, it will return that instance instead of generating a new one- and - since metadata is accessed via the instance, you get at most **one** load - from the cache backend per instance in memory- cache pull only occurs when - required also. As such, far faster for when doing random package accessing - and storing of said packages (think repoman, dependency resolution, etc). diff --git a/doc/dev-notes/tackling-domain.rst b/doc/dev-notes/tackling-domain.rst deleted file mode 100644 index e42a85d8..00000000 --- a/doc/dev-notes/tackling-domain.rst +++ /dev/null @@ -1,45 +0,0 @@ -================= - Tackling domain -================= - -tag a 'x' in front of stuff that's been implemented - -unhandled (eg, figure these out) vars/features - -- (user)?sandbox -- digest -- cvs (this option is a hack) -- fixpackages , which probably should be a sync thing (would need to - bind the vdb and binpkg repo to it though) -- keep(temp|work), easy to implement, but where to define it? -- PORT_LOGDIR -- env overrides of use... - -vdb wrapper/vdb repo instantiation (either domain created wrapper, or -required in the vdb repo section def) - -- CONFIG_PROTECT* -- collision-protect -- no(doc|man|info|clean) (wrapper/mangler) -- suidctl -- nostrip. in effect, strip defaults to on; wrappers if after - occasionally on, occasionally off. -- sfperms - -build section (vars) - -- C(HOST|TARGET), (LD*|C*)FLAGS? -- (RESUME|FETCH)COMMAND are fetcher things, define it there. -- MAKEOPTS -- PORTAGE_NICENESS (imo) -- TMPDIR ? or domain it? - -gpg is bound to repo, class type specifically. strict/severe are -likely settings of it. the same applies for profiles. - -distlocks is a fetcher thing, specifically (probably) class type. - -buildpkgs is binpkg + filters. - -package.provided is used to generate a seperate vdb, a null vdb that -returns those packages as installed. diff --git a/doc/dev-notes/tests.rst b/doc/dev-notes/tests.rst deleted file mode 100644 index 669b4ac2..00000000 --- a/doc/dev-notes/tests.rst +++ /dev/null @@ -1,34 +0,0 @@ -======== -Testing -======== - -We use twisted.trial for our tests, to run the test framework run: - - trial pkgcore - -Your own tests must be stored in pkgcore.test - furthermore, tests must -pass when ran repeatedly (-u option). You will want at least twisted-2.2 -for that, <2.2 has a few false positives. - -Testing for negative assertions -=============================== - -When coding it's easy to write test cases asserting that you get result xyz -from foo, usually asserting the correct flow. This is ok if nothing goes -wrong, but that doesn't normally happen. :) - -Negative assertions (there probably is a better term for it) means asserting -failure conditions and ensuring that the code handles zyx properly when it -gets thrown at it. Most test cases seem to miss this, resulting in bugs -being able to hide away for when things go wrong. - -Using --coverage -================ - -When writing tests for your code (or for existing code without any tests), it -is very useful to use --coverage. Run `trial --coverage <path/to/test>`, and -then check <cwd>/_trial_temp/coverage/<test/module/name>. Any lines prefixed -with '>>>>>' have not been covered by your tests. This should be rectified -before your code is merged to mainline (though this is not always possible). -Those lines prefixed with a number show the number of times that line of code -is evaluated.
