Hi, the following is a report of a successful implementation of what I have been talking about with Niels Thykier during debconf13. The question was how important it is for a source package to be compilable or exist in the first place given an incomplete port which is in the process of being bootstrapped. This work is solving a different purpose than the identification of "key packages" by Lucas Nussbaum [1]. Instead of attaching a binary value to each source package, this method is associating integer values to them. Once bootstrapping of the whole archive becomes more important or even possible in real life through an implementation of build profiles, this heuristic could be used to further extend the meaning of "key packages" as well.
This heuristic attaches to each source package A the number of source packages which need A to be compilable so that they become compilable themselves. The dependency graph which is needed to extract this information is conveniently created by the service I run as http://bootstrap.debian.net - I'm using a simple Python script to walk this graph to extract the information. In fact that Python script uses two different graphs. Since dependencies contain disjunctions, there exists different choices for packages which have to be available for something to be compilable or installable. To not make this choice arbitrary, I calculate the minimum number of dependencies that have to be available (strong dependencies) and the maximum number that has to be available (dependency closure). Therefore each source package A is associated with two numbers: the minimum amount of source packages which depend on A being compilable and the maximum number of source packages which depend on A being compilable. To create more than syntactic meaning I also added popcon information to the output. I associate to each source package A the sum of all popcon values of the source packages which depend on A being compilable. Again this is done for the minimum as well as the maximum. So here is the (tab delimetered) data in no particular order: http://mister-muffin.de/p/pVxb.txt 1st column: the name of the source package 2nd column: minimum number of source packages which need this source pacage to be compilable 3rd column: maximum number of source packages which need this source pacage to be compilable 4th column: minimum sum of popcon values 5th column: maximum sum of popcon values Do you see any obvious error? When sorting the data by the second column, you will see that there are 1194 source packages with the same value: 19554. This value corresponds to the total amount of source packages. It means: everything else depends on these 1194 source packages being compilable. If those 1194 source package are not compilable then the rest will be neither. Remember that this only true during a bootstrappping scenario. These 1194 source package are also all part of the same strongly connected component of the strong srcgraph and roughly correlate to the smallest set of packages which are needed for a self-hosting Debian system. We call a set of binary and source packages self-hosting if all binary packages can be created from the source packages and all source packages can be compiled with just the available binary packages. In my opinion it would make sense to make all packages which are at minimum required to make Debian self-hosted to the set of "key packages" by extending the definition by Lucas Nussbaum at [1]. The amount of source packages which are needed to bootstrap themselves and all the rest of Debian is that high because it includes source packages which are only included because of the arch:all binary packages they build, because of the essential:yes packages they build or because of the build-essential packages they build. While it is important to include these for rebuilds of the whole archive, they are not important in a real bootstrap situation. Arch:all binary packages already exist and do not need to be bootstrapped and to start to compile packages natively, a minimal build system (essential:yes + build-essential) is required in the first place. Therefore I created a different graph which takes into account that arch:all packages as well as the packages of the minimal build system do not need to be rebuild: http://mister-muffin.de/p/Gid8.txt One can see that now the amount of source packages which is needed to build the rest of the archive is only 383. It is important that these source packages remain compilable (in addition to essential:yes + build-essential being cross-able) because otherwise a bootstrap of any new architecture cannot be done. The service at http://bootstrap.debian.net will indicate that an architecture is not bootstrappable at all if this is the case. Does anybody see enough value in these numbers for source package importance in the light of bootstrapping Debian (either for a new port or for rebuilding the archive from scratch)? If so, then I can generate these numbers for all source packages on a daily basis and publish them with the rest of the data on http://bootstrap.debian.net cheers, josch [1] https://lists.debian.org/debian-devel/2013/05/msg00496.html -- To UNSUBSCRIBE, email to debian-68k-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20131127175834.2752.85430@hoothoot