Hi Daniel, thanks for your detailed report!
You also commented a lot on your actual practice (thanks!) so I changed the subject to reflect the slight topic change of my reply. On Wed, Nov 14, 2012 at 05:54:06PM -0800, Daniel Schepler wrote: > I read your recent post to debian-devel with great interest, as I've > done some bootstrapping efforts in the past, and I'm currently in the > middle of a "port" for the x32 ABI. In the past, what I've done > (mostly privately) was to develop a script I called "pbuildd" which > essentially just runs through the list of currently unbuilt packages > and tries running pbuilder on them all, then installs anything that > succeeds into a local repository and starts up the loop again. This is what a small function does for me in the very early steps in a theoretical manner. Naturally it quickly fails due to dependency cycles ;) > Then, when things got stuck, I just did a manual inspection of the > unsatisfied dependencies to find the cycles, and chose one to break. > In fact, I've just started uploading my current iteration of this to > http://87.98.215.228/debian/ -- you might want to especially look at > scripts/pbuildd which is the central script to run this loop. (And > over time, it's gathered various optimizations to speed up the > "installation into local repository" step, try to avoid invoking > pbuilder if it can easily determine that certain Build-Depends aren't > present at all, etc.) What my tools try to do, is to figure out a build order for bootstrapping Debian from nothing. This order can then be given to a tool that does the actual compilation in that order. The "figuring out the order" part is purely theoretical. I only look at the Packages and Sources files and the dependency relationships stored within to generate a dependency graph which I then evaluate. My tool doesnt know or care about whether or not a package can actually be compiled on the new architecture. It does no compilation by itself and can therefor not figure this out by itself. Running into compilation problems is (as of now) still what the user would have to take care of (from the point of view of my tools). I call it "my tools" because there is no name for the project yet. The git repository [6] and mailing list [7] just run under the name "debian-bootstrap". At this point I should also mention that everything heavily depends on dose3 and Pietro Abate is a great help with this project and I certainly wouldnt be where I am without dose3 and his continuous help and additions to the project. > Initially, when I needed to break a cycle, I would just build > something by hand and stick it into the "partial" directory, but over > time I started developing automated cycle-breaker scripts, which are > currently under scripts/cb.inactive (the pbuildd script looks for them > under scripts/cb). I had a look at the files in scripts/cb.inactive and they seem to store lots of information about which build dependencies can be dropped for a huge number of source packages. This is, if I read lines like inst_pkgs "`get_control_re $PBUILDD_ROOT/build/a/antlr/*.dsc 'build-(depends|depends-indep)' | sed -e '/\<gcj-native-helper\>/d' -e '/\<nant\>/d' \ -e '/\<cli-common-dev\>/d' -e '/\<mono-devel\>/d' \ -e '/\<libmono-winforms2\.0-cil\>/d'` correct in meaning that gcj-native-helper, nant, cli-common-dev etc can be dropped, right? Sadly, that information is stored in a turing complete format (bash scripts) which makes the information badly machine readable. But if the format '/\<package\>/d' is mostly used, then I guess a regex can extract lots of the information with some tolerable uncertainty. I will try to hack up a script that harvests the droppable build dependencies from the files you have in scripts/cb.inactive. This information might be immensely usable, thanks! As a porter has to come up with those droppable build dependencies for each new port, a new syntax has been proposed in [3] by Guillem Jover called "build profiles". Would this information be included in the build dependencies of some core packages, bootstrapping would already become much easier for a porter. An example of how the proposed format works: Build-Depends: huge (>= 1.0) [i386 arm] <!embedded !bootstrap>, tiny The < and > "brackets" are used in the same way [ and ] are used for architectures to denote the profile for which that dependency can be dropped (or is exclusively required). Besides bootstrapping, such profiles could also be used for embedded builds or for bootstrapping compilers that need themselves to be built. The latter topic also recently came up on debian-devel [8]. There exist trivial patches for dpkg [4] and dose3 [5] to implement this functionality. > The scripts tend to become outdated over time, though, with a moving > target, and I'm sure the current state is no exception. My personal > heuristics for what I preferred were: first, prefer cycle-breaking > which just removes Build-Depends which are there to build > documentation. Then, prefer cycle-breaking which ignores > Build-Depends on one or a few libraries which provide purely optional > features. If I couldn't find anything of this sort, I'd just try to > find the cycle-breaking point which would be (fuzzily) "least > invasive" and "least likely to break the resulting packages, at least > as far as packages that Build-Depend on them". In current Debian Sid, the dependency graph contains a ~930 nodes central strongly connected component (SCC) of dependency cycles. Braking this into a directed acyclic graph (DAG) is not trivial because edges (dependencies) are missing weights. Weights would be information how hard or how undesirable it is to drop a build dependency. In a post at [1] I argue that the problem of braking this SCC and turning it into a DAG through reduced build dependencies even becomes harder over the years, as the SCC grows in size. What a part of my algorithm does, is to implement heuristics that allow to present those edges to the user which would make "sense" to break from a theoretical point of view. An example would be a build dependency only needed by a single source package but itself drawing in dozens of more dependencies, which themselves draw in more. Dropping this build dependency would immediately allow to greatly reduce the size of the SCC. Naturally the importance of a build dependency can only be judged by a human as only he can figure out how essential package X is to compile source package Y or how hard it would be to change source package Y to build without X. An exception to this are, what I call "weak build dependencies". Those are the "documentation building" packages you mention above. Since they can mostly be dropped from any package that has them as a dependency without doing harm, they are the first thing my algorithms remove as well. A current list can be found here [2]. Can it be extended? At this point let me also mention, that the Build-Depends-Indep field is immensely helpful when looking at the bootstrapping problem from a theoretical point (as I do), because arch:all packages do of course not have to be rebuild. In fact, lots of the "weak" documentation building dependencies can just be moved to Build-Depends-Indep and by that removing the need for me to have this list. P. J. McDermott ("Bootstrappable Debian" GSoC project) managed to find many source packages with Build-Depends entries that could be moved to Build-Depends-Indep, making the bootstrapping process easier. In [9] I supply a list of core packages that build arch:all packages but have no Build-Depends-Indep field field but a binary-indep or build-indep target in their debian/rules and combinations thereof. > In the past, pbuildd was mainly geared towards trying to build all of > Debian (including the binary-indep packages) starting from a minimal > chroot and with minimal extra package downloads, but on an established > architecture. It was only recently that I started applying it to > bootstrapping x32. The way I started that was actually: I started off > mainly following the instructions from Linux From Scratch, though of > course adjusting it to "cross-building" to x32 as necessary. I also > inserted dpkg into the process as soon as possible after the first LFS > stage creating the chroot with /tools, and from then on ran installs > into temporary directories, and built dummy dpkg packages with no > dependencies. Then, after the LFS builds were over, I started > building real Debian packages from the actual .dsc source packages, > and eventually had enough packages built in this way that I was able > to do a debootstrap, and start the pbuildd process. So you used LFS to build the initial chroot. That's new for me as from what I've heard so far, people were using Gentoo or openembedded in the past to avoid having to cross compile a core of Debian. With multiarch, crosscompiling becomes much nicer in Debian. Sadly, lacking multiarch'ing of packages still prevents that a Debian base system can be multiarch cross-compiled. Wookey is currently attempting this with Ubuntu (as they have more multiarch). From a dependency theoretic point of view, very little (few dozen packages of the core packages) have their cross build dependencies satisfied. cheers, josch [1] http://blog.mister-muffin.de/2012/10/13/does-it-become-harder-to-bootstrap-debian- [2] http://wiki.debian.org/DebianBootstrap/TODO#Find_more_weak_build_dependencies [3] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=661538 [4] http://bootstrap.pehjota.net/dpkg/dpkg-build-profiles.patch [5] http://lists.mister-muffin.de/pipermail/debian-bootstrap/2012-July/000306.html [6] https://gitorious.org/debian-bootstrap/bootstrap [7] http://lists.mister-muffin.de/cgi-bin/mailman/listinfo/debian-bootstrap [8] https://lists.debian.org/debian-devel/2012/10/msg00361.html [9] http://wiki.debian.org/DebianBootstrap/TODO#Use_Build-Depends-Indep -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20121115042603.GA31238@hoothoot