Good morning, I want to talk about improved binary package support for Gentoo. About 1-2 months ago there already was a discussion about this on gentoo-soc@ and on bugzilla [1]. If I remember correctly, there were no devs involved in the discussion, so I thought I'll post my thoughts here.
I know, that Gentoo is a source-based distribution or meta-distribution, and I don't want to make Gentoo another Fedora or Ubuntu, but I think there are some things we can learn from them. The current situation: Binary packages are (usually) stored in /usr/portage/packages/$category/$package-$version.tbz2. The package consists of the "real binary package" and the metadata (combined using xpak or whatever). Problems I see with this: 1) If a binary package is built because it needs to be linked against a new library, because the USE-flags change or because the ebuild changes without a revision bump, the "old" binary package is overwritten. This also means that there is no support to store multiple packages with different USE-flags without, well, using different directories. 2) To find out which USE-flags a package is built with, one needs to download the package and look at the metadata. Today I discoveres a file called "Packages" which looks like a metadata cache, but I did not find more information about it (only tried "man portage"). So, how can we address this? First we should do something about 2), I think: I want to propost the following scheme: Binary packages are stored in $arch/$description/$category/$package/$package-$version-$ev-$use-$bv.tbz2. $arch: This is x86, ppc or whatever you put into ACCEPT_KEYWORDS minus the '~'. It does not make sense to make a distinction here. $description: Something like pentium3, core2quad, G4, or whatever. Pentium3-uclibc, Pentium3-solaris-prefix are also possible. $category, $package and $version should be clear. $ev: The "ebuild version". See below. $bv: The "binary version". See below. $use: The USE-flags. See below. About ebuild version, USE-flags and binary version: I would like to encode the USE-flags into the filename. This enables us to have binary packages of the same version built with different USE-flags in the same repository. Some wanted to have this in the directory, some say it is ok to have it in the xpak only and some prefer the "Packages"-like file. I think, USE-flags can be set per package and therefore should be stored per package, not per $description or whatever. Having it only in the xpak allows no distinction between multiple binary packages, same version, differen USE-flags and the same is true for the Packages file. This would also be created, downloaded all the time and so on. Therefore I think the cleanest solution is having USE-flags in the filename. There are different methods to store it there. a) A checksum (of the USE-flags, the USE-flag string, the ebuild and the USE-flag string, whatever). b) List the enabled USE-flags in the filename, use a) if the string gets too long. c) Use a packed binary vector. I don't like a), because it is not easily reversible. You could always download the Packages file or the binary package and look into the xpak metadata, but that's too much effort. b) also has the problems i mentioned for a). Also, you'd need some system to distinguish ebuilds with the same version but different USE-flags. You also need that for c), so b) has no advantages ofer c) in my eyes. For c) I think of the following: Sort the USE-flags in some defined way (ASCII code, whatever) and make a vector with a 1 for every enabled USE-flag and a 0 for every disabled USE-flag. Compress that vector: If you use HEX code, you need 1 character for every 4 bits, but it should be possible to find 64 different characters, then you need 1 character for every 6 bits. PHP has 106 USE.flags, that would make a USE-string with about 18-27 characters. Packages with lots of USE-expand stuff like languages would need more, but not too much, I think. Problems: The string might get long, you get big problems with USE-flag renames, USE-flag additions or removals. That's where the ebuild version is needed. Or not. We have 3 possibilities: a) Change policy: USE-flag changes in an ebuild need a version bump. b) Use a checksum of the ebuild. c) Use the version given by the version control system. The problem with a) is, that is a change in policy and probably hard to do. Increasing the revision for a (trivial) change leads to a lot of unnecessary rebuilds for users. It also means, that USE-flag changes in eclasses are difficult, the eclass should probably copied over to a new name with version and only ebuilds with a new version (revision) are allowed to use it. The problem with b) is, that it is not ordered. You don't know, which is the newest version. If you have an ebuild with a version where there is no binary package for, it gets difficult/ugly. c) also has problems: When using cvs, there are versions easily available. The same is true for svn, but lots of distributed version control systems like git use checksums as versions. Welcome back to b). Another thing is, how do we get to the versions? Will they be in the header forever, since they make signing ebuilds or the manifest much more complicated (multiple commits necessary)? But, well, since metadata is generated and provided by "the tree", it should be not too hard to ad a unique ebuild version there (in the case of checksums, use an integer, increase whenever the checksum changed or something). It just might make using overlays a bit more difficult. The last thing to be discribed is the binary version. Lots of people talk about dependencies to other binary packages when they talk about binary packages for Gentoo, but that gets quite difficult (and, in my opinion, ugly). We mostly need to provide a "consistent set" of packages, which means, if A depends on B, B changes and therefore breaks A, we need to provide an updated version of A. And we can do that with simply increasing the binary version, since the package manager knows then, that this package needs updating, too. How to create binary packages? Create some build server (or build server infrastructure). The most important thing is a script or something that provides the functionality. One enters a make.conf, /etc/portage dir, path to the profile, description and whatever else is needed and the system starty building. Then you can create a second set of data and start building and the system puts the binary packages in the same directory and discovers what needs to be built and what not (because apache needs to be built only once if its USE-flags are the same for the different configuration sets). But there are thousands of packages and millions of USE-flag combinations! Seriously, who cares? The goal of this project (as it exists in my head) is not to provide everything. It is to provide the most used packages. If you need parrot, compile it yourself. If you need netbeans, compile it yourself. We have @system, gnome, kde and anothe hand full of packages, which will change over time. I'm, really lookign forward to the data collected by the statistics project (GSoC). The same is true for USE-flags: We might provide gnome, kde, both, a server profile and whatever we decide to provide, but not everthing. Again, statistics will help. Same with CFLAGS. Probably no -O3, no -ffast-math, no -break-my-code or whatever. Probably x86 with 32 and 64 bit for the beginning, later maybe more. So, the really really cool thing is, that if you are some company, university, institution or freak, with lots of (similar) Gentoo boxes, you can set up a build server and even share the binary packages, if you want. Same level of security as non-official overlays, but in the university of FooBar in Jamaica uses it, there should not be too many security problems. Thanks for reading, please discuss, I probably forgot lots of stuff, but I can tell it later in the discussion. Philipp [1] https://bugs.gentoo.org/show_bug.cgi?id=150031