Hi Eric, Currently at a workshop in Leiden [1] we figured out one another possible use case for your proposal. Some people does develop PARI/GP in parallel of Sage. One simple way to have a testing environment would be to have: * a git repo for PARI/GP * a git repo for SAGE * telling SAGE to use the development version PARI/GP (wherever it is installed)
Though, it triggers one question: how one would relaunch the chain compilation due to a PARI/GP update? Would it be automatically handled by the Makefile? (the same question holds for system packages of course) Best Vincent [1] https://www.universiteitleiden.nl/en/events/2017/07/workshop-on-algorithms-in-number-theory-and-arithmetic-geometry/ Le vendredi 26 mai 2017 15:01:36 UTC+2, Erik Bray a écrit : > > Hi folks interested in Sage packaging, > > Almost every time the topic comes up, I complain that it isn't easier > to use more system packages as both build- and run-time dependencies > of Sage. I'd like to make some progress on actually doing something > about that, and I have some ideas, but I'd like to bounce them off > anyone who's interested first before just going off and doing it. > > There is enough work involved in this that I believe it can and should > be broken up into a number of smaller tasks. I would also like to > approach this in a way that works well and integrates with the > existing "sage-the-distribution" infrastructure. I believe there are > advantages to being able to develop on Sage in the "normal" way we're > already used to, while also being able to take advantage of existing > system packages wherever possible. > > So I'm just going to try to organize my existing thoughts on this and > see what anyone thinks. Sorry if it's TL;DR, but I'm hoping that > having a detailed discussion about this will make it more likely that > something will actually be accomplished on it soon (because I think > the actual implementation, once decided on, is not terribly > difficult). > > Note: In this message I'm using "package" loosely to refer to any > program, library, database, or other collection of files that is > distributed and installed as a self-contained unit. It doesn't > necessarily relate to any particular "packaging system". > > > 1. Why? > ======= > > The extent and scope to which Sage "vendors" its dependencies, in the > form of what some call "sage-the-distribution", is *not* particularly > normal in the open source world. Vendoring *some* dependencies is not > unusual, but Sage does nearly all (even down the gcc, in certain > cases). I've learned a lot of the history to this over the past year, > and agree that most of the time this has been done with good reasons. > > For example, I can't think of any other software that forces me to > build its own copy of ncurses just to build/install it. This was > added for good reasons [1], but not reasons that can't also resolved > in part by installing the appropriate system packages, or that might > not be resolved by now in system packages that depend on ncurses (i.e. > that should be built with ncurses support). Point being, this issue > does not necessarily impact everyone, and building Sage's own ncurses > is overkill in that case. It would be one thing if we were just > talking one or two packages (I didn't pick on ncurses for any deep > reason), but now multiply that by around 250 (give or take, depending > on how many dependencies are even available as system packages) and it > becomes real overhead to getting started *and* making progress with > Sage development. > > I wouln't propose *removing* any existing spkgs that are still > relevant. I think it's really useful that Sage has a list of > known-good pinned versions of its dependencies. Further, > "sage-the-distribution" makes it very easy to install those > dependencies in such a way that they can be used as build/runtime > dependencies by Sage without having to hunt the 'net for the right > source packages of the right versions of those dependencies, and > figure out how to configure and build them in a piecemeal fashion. In > other words, even if we do expand the ability to use system packages > for Sage's dependencies, it's still very nice that it's easy with a > few commands to use the spkg if something goes wrong with the system > package. It's also, of course, important for power users who wish to > compile some dependencies on their own--especially highly tuned > numerical libraries (but even those users usually only care about > being able to hand-configure a few dependencies, not most). > > To summarize: being able to more aggressively rely on system packages > can save a lot of time and frustration during normal development of > Sage, and is also less jarring especially to new developers, of whom > we would like to attract more. It should also decrease the time > required to regularly build binary distributions of Sage (e.g. for > Docker, Windows, and Linux distros). > > > 2. Overview of how Sage manages dependencies now (and what won't change) > ======================================================================== > > For many of you this will be unnecessary review, but I want to discuss > a little about how dependencies are currently checked and installed in > Sage-the-distribution. Doing so is helpful for me too, to make sure I > understand it clearly (and correct me if I have any > misunderstandings). > > Sage-the-distribution uses *Make* itself (cleverly, IMO) to manage > dependencies insofar as making sure all dependencies are installed, > and that when a package changes all packages that depend (directly or > indirectly) on that package are rebuilt. Make works on files and > timestamps, which does not translate directly to entire software > packages, so to track whether or not an spkg is up to date, Sage uses > the common "stamp pattern" for Make [2]--that is, when an spkg is > installed it writes a file that effectively "represents" completion of > the installation of that spkg for Make's purposes. These stamp files > are the files typically stored under > $SAGE_LOCAL/var/lib/sage/installed/<spkg>-<version>. This directory > is also known in some places as SAGE_SPKG_INST. By including the > version number in the name we can also force rebuilds when an spkg's > version changes. > > When one runs `make <spkg>` with just the spkg name, this is actually > a phony target with the path to the stamp file for that package (at > its current version) as the sole target. So `make <spkg>` translates > to `make $SAGE_SPKG_INST/<spkg>-<version>` for the current version of > that spkg. The associated rule is to run the sage-spkg command for > that package, which also takes care of writing the stamp file. > sage-spkg also writes some information into each stamp file in a > somewhat loose format that I don't believe is parsed anywhere. > However the *existence* of these files is used by the (somewhat > controversial, for downstream packagers) `is_package_installed()` > function.* I'm actually going to propose later that we write and use > these stamp files (with some slight changes) even when installing > dependencies from a system package, so these files might be present > even in binary packages for Sage (though that might be up to > downstream packagers). > > When Sage's `./configure` script generates the main Makefile for all > of Sage's dependencies, it loops over all the spkgs in build/pkgs/ and > creates two make targets for each spkg: the aforementioned phony > target consisting of just the package name, and the *real* target for > the stamp file. It also creates a make variable named like > `$(inst_<spkg>)` (where <spkg> is just the package name, without the > version) referring to the full path of the stamp file for that > package. Each spkg may list its build dependencies in its > build/pkgs/<spkg>/dependencies file, in the format that it will appear > in the Makefile as dependencies for the make target of that package. > For convenience's sake, the `dependencies` file just contains the > package names, but the `./configure` script converts this to the > appropriate `$(inst_<spkg>)` variables, so that the stamp files become > the real dependencies (part of how the "stamp pattern" normally > works). > > When a package is upgraded (i.e. its version number changes) then the > Makefile is regenerated, but with the `$(inst_<spkg>)` for that > package pointing to a new stamp file, containing the new version > number. Thus any dependents of that package will see this as an > outdated dependency, and get rebuilt after the upgraded package is > built. When packages are rebuilt (even if their version didn't > change) their stamp files are touched, forcing further rebuilds of any > of their dependents and so on, in normal Make behavior. > > As far as I can tell this has worked quite well for Sage--especially > as it also allows leveraging Make's parallel build features. So I'm > proposing to keep this all pretty much as-is, with possibly only minor > tweaks in the details. Instead, many more of the changes will be at > configure time. > > > * There is proposed work already mostly done to replace use of > is_package_installed() within the Sage library with a way to do > runtime feature checks: https://trac.sagemath.org/ticket/20382 Some > of this work *might* be redundant with what I want to propose, but can > also coexist with it, as it is currently designed for runtime use by > the Python code itself, and not during builds. > > > 3. Case study--examples already in Sage > ======================================= > > Sage-the-distribution already has a few examples of "spkgs" in the > system that *may* use a system package, rather than building from > source. As it is this is done in an ad-hoc manner that can be > surprising and/or misleading. But I think it's useful to look at them > to see how this is done currently and if there's anything we can learn > from it. > > a) Blas > ------- > > There are two different BLAS implementation packages to choose from > currently in Sage: OpenBLAS and ATLAS.* The selection can be made > currently at configure time with a --with-blas= flag which can take > either 'openblas' or 'atlas'. The selection is used to write a > variable called `$(BLAS)` in the makefile that points to the stamp > file path for the actual BLAS implementation spkg selected. Other > spkgs that have BLAS as a dependency list the `$(BLAS)` variable in > its dependencies, rather than writing "openblas" or "atlas" > explicitly. > > When openblas is selected (now the default) the openblas spkg is > installed unconditionally. > > However, when *atlas* is selected, there happens to be a mechanism for > using a system BLAS (why just with ATLAS I don't know--historical > reasons I guess). In this case it still runs the spkg-install for > ATLAS like for any other spkg, but its spkg-install checks for a > special environment variable, `SAGE_ATLAS_LIB` (the only way to > control this behavior). This invokes a search in standard locations > first for a "libatlas.so" (or equivalent) explicitly. If that's not > found, it will happily take whatever it does find as long as there's > *some* "libblas.so" and "liblapack.so" found on the system. It > doesn't do any feature checks or anything--it just takes what it > finds. > > If it does find something resembling either ATLAS specifically, or a > generic BLAS/LAPACK, then it skips installing the actual spkg, but > still writes a stamp file indicating that "ATLAS" was installed, with > whatever version is in the package-version.txt for the spkg, which can > of course be misleading. (It also writes pkgconfig .pc files in > $SAGE_LOCAL/lib for blas/cblas/lapack indicating which libs it found, > along with a "fake" version of "1.0".) > > This, Sage will use these system libraries for all build and runtime > requirements of BLAS, and in my experience this has generally worked. > > * There is another issue I would like to address--slightly orthogonal > to supporting system packages--of having a regular way to support > "abstract" packages that can have multiple alternative implementations > (another example being GMP/MPIR). This has been talked about before, > such as in this recent thread [3]. I have some ideas about this that > integrate well with my ideas for system packages, but I will try to > save that for a separate message. > > > b) GCC > ------ > > The GCC spkg is a bit of a different beast, since it is normally not > installed by default, and was only added to support cases where the > platform's GCC is broken or too old and has bugs that affect building > Sage or its dependencies. > > Although Sage's `configure` script is responsible for determining > whether or not GCC should be installed (in contrast to hacks in > spkg-install like for ATLAS), there is no *flag* for `configure` (e.g. > --with-gcc or something like that) for controlling this. Instead the > behavior is controlled solely by an environment variable > "SAGE_INSTALL_GCC" (this should probably be fixed, but we'll come to > that). If the environment variable is set to "yes"/"no" then that > forces the gcc installation behavior one way or the other. However, > if the environment variable is not set, then the configure script goes > through the necessary checks to see if the installed gcc is new > enough, and also if gfortran is installed, among others. If GCC > installation is deemed necessary then it sets a flag indicating as > much, called `need_to_install_gcc=yes`. > > This is used later (see next section) to set the `$(inst_gcc)` variable. > > c) git > ------ > > Sage actually includes an spkg for git, and installs it > unconditionally (there is currently no way to control this) if a > working 'git' is not found on the system. This is one of the few > packages that just has a straightforward check for the system version > at configure time. If a working git is not found (where 'working' > here just means `git --version` works) the script sets a variable > (similar to the gcc case) called `need_to_install_git=yes`. > > (It also sets a similar variable for `need_to_install_yasm` on > x86-based systems.) > > Later, while writing the main Makefile, the configure script loops > over all spkgs that *might* be installed and checks for a > `need_to_install_<spkg>` variable. If not found, or not set to "no", > the script sets the `$(inst_<spkg>)` variable to point to the standard > stamp file for that package. Otherwise it sets `$(inst_<spkg>)` to a > dummy file that always exists (this way any dependencies for that > package are still satisfied, but the spkg is never actually > built/installed). > > > 4. Package sources > ================== > > One of the main changes I'm proposing is that stamp files for packages > will always be written to SAGE_SPKG_INST even for cases where the > system package is used, and the Sage spkg is not actually installed. > > That is, I want to change the meaning of "spkg" to more broadly > represent "a dependency of Sage that *may* be included in > Sage-the-distribution". > > To this end I want to define a concept of spkg "sources" (not to be > confused with source code). Instead, these are sources from which the > spkg dependency can be satisfied. Three possible sources I have in > mind (and I'm not sure that there would be any other): > > a) sage-dist: This is the current notion of an "spkg", where the > source tarball is downloaded from one of the Sage mirrors, unpacked > and installed to $SAGE_LOCAL using sage-spkg + the spkg's spkg-install > script. The resulting stamp file, with the version taken from > package-version.txt is written to $SAGE_SPKG_INST. > > b) system: In this case a check is made to see if the dependency is > already satisfied by the system. How exactly this check is performed > depends heavily on the package. *If possible* the version of the > system package is also determined (will discuss the nuts-and-bolts of > this later). In this case a stamp file is still written to > $SAGE_SPKG_INST, but indicating somehow that the system package was > used, not the sage-dist package. > > c) source: This case is not necessary for supporting system packages, > but I think would be useful for testing new versions of a package. In > this case it would be possible to install an spkg from an existing > source tree for that package, which would be installed using the > spkg-install script. If possible the version number would be > determined from the package source code, and not assumed. I think > this would be useful, but won't discuss this case any further for now. > I just point it out as another possibility within this framework of > allowing different spkg "sources". > > To summarize, no matter how an spkg dependency is satisfied, a stamp > file for that spkg is written to $SAGE_SPKG_INSTALL, possibly > indicating the *actual* version of the package being used by Sage, and > indicating how the dependency was satisfied. > > > 5. Nuts and bolts > ================= > > a) New stamp file format > ------------------------ > > As suggested in the previous section, no matter how an spkg dependency > was satisfied, a stamp file is written to the $SAGE_SPKG_INST > directory. In order to support multiple possible package "sources", > the source that was used should be included in the stamp file. This > way, it will also be possible to re-run `./configure` and specify a > different source for a package, thus forcing a rebuild. So I think > the stamp filename format should be something like: > > $SAGE_SPKG_INST/<name>-<source>-<version> > > where <name> would be the base package name, <source> would be > something like "sagedist" or "system", and <version> the *actual* > version of the package being used. I'll discuss in the next section > how this might be determined for system packages. There's plenty of > room for bikeshedding in this, but I think this makes sense. We could > also support the old filename format, if such files are found, for > backwards compatibility. > > > b) Checking packages > -------------------- > > For any dependency that may be satisfied by system packages, there > needs to be a way to specify what the minimum dependency is for Sage > (be it a version number, or the presence of certain features) there > needs to be a way for each package to check that the dependency is > satisfied. > > I've gone back and forth on exactly how this should be done, but I > think that the best way to do this is to allow per-package m4 files, > containing an m4 macro that checks that dependency on that package is > satisfied (again, be it version number or some other check). Each > macro could be named something like > > SAGE_SPKG_CHECK_<name> > > Optionally the macro should set a variable indicating the package > *version* if the package dependency is satisfied. This is the version > string that can be used in the stamp file, for example. If there is > no clear way to determine the version (though it most cases there will > be), a string like "unknown" could still be allowed for the version. > The macro would be defined in a file like sage_spkg_check.m4 under > each build/pkgs/<spkg> directory, and loaded on an as-needed basis > using the m4_include command in configure.ac. > > Writing an m4 macro for autoconf is not a common skill, which is why > I've hesitated on this. But I think it has a few justifications: It > allows one to take advantage of the many existing macros that come > with autoconf to perform common checks, such as whether a program is > installed, or a function is available in a library. For many packages > the SAGE_SPKG_CHECK_ macro would probably just wrap one or two > existing autoconf macros. Another justification is that for some > packages there may be existing macros to check for them that we can > borrow from other projects. > > We can also provide, in the documentation, a simple template macro > demonstrating how to wrap a few shell commands. > > *NOTE*: To be clear, I'm not proposing that, to implement this > proposal, we go through and write 250+ m4 macros for every Sage spkg. > This check will be optional, and we can write them one at a time on an > as-needed basis, starting with some of the most important ones. I'll > discuss more about how missing checks are handled in the next section. > > Obviously the packages that already have checks in configure.ac (gcc, > git, yasm) would have those checks moved out to their package-specific > macros. > > > c) Driving the system > --------------------- > > As previously noted, selecting the source for a package would be done > at ./configure time. My proposal would be to change very little about > the current default behavior. > > By default, all packages would be installed from the sage-dist source > as is the case now. We could still make exceptions for build > dependencies like gcc and git. I don't care whether these exceptions > are hard-coded in configure.ac, or specified in some generic way. > > However, the configure script would support, for all spkgs, a > `--with-system-<spkg>` argument (e.g. `--with-system-zlib`). > > For each spkg to be installed (all standard packages, optional > packages if selected), if the `--with-system-<spkg>` argument is > given, it will attempt to load and run the SAGE_SPKG_CHECK_<spkg> > macro for that package. If the macro is not defined, there would be a > *warning* that system package was selected for that package, but there > is no way to check if it was installed. The warning would make clear > that if the build fails it may be due to this dependency being > missing. Otherwise it runs the check, and if the check succeeds the > configure script would continue, while if the check fails the > configure would stop with an error. > > Optionally, we could add arguments to control all of this behavior. > For example, it might be useful to have an option to install the > sage-dist spkg if a check is not defined. This might even be better > as the default--a possible bikeshed issue. > > Another possible option is one that enables system packages, but > disables any checks. This might be useful for system packagers who > already have external guarantees that the dependencies have been met. > > Finally, there should be an option like `--with-system-all` to > automatically use system packages for all dependencies, so that > downstream packagers don't have to supply hundreds of `--with-system-` > flags. > > Otherwise, generation of the build/make/Makefile by the configure > script would proceed more or less as it does currently. It would just > take into account information gained through any `--with-system-` > flags to generate the new format stamp filenames. The .dummy stamp > file would not be used anymore. Also, the rule for building system > packages would be to simply write the stamp file. > > > 6. Q&A > ===== > > Q: What if I install with --with-system-<spkg> but later want to > install the sage-dist version of that package? > > A: We should also support some way to deselect system packages. > Perhaps --without-system-<spkg> / --with-system-<spkg>=no (these are > two ways of saying the same things in standard configure scripts). > > Q: The reverse: What if I install the sage-dist package, but want to > switch to the system package? > > A: Same thing, but this is a little trickier because we would need to > *uninstall* the package from $SAGE_LOCAL. I have a proposal for > improving spkg uninstallation written up at > https://trac.sagemath.org/ticket/22510 > > Q: What if I use a system package when building Sage, but that package > is later upgraded, or worse, removed? > > A: There's no great solution to this. Certainly, I think the > ./configure time checks should be cached (since updates are not > usually *that* frequent). So there needs to be good documentation on > invalidating the cache when re-running ./configure. Still, that only > helps with configure-time detection. Sage can still break at runtime > if a system package it depends on changes. This is a generic problem > for *any* software development, however, and something developers > should be aware if if they're updating their system. Granted, most > people don't always closely examine what's changing when they install, > for example, OS updates. I certainly don't always check this with a > fine-toothed comb. But it's a general issue. Keeping the ability to > install the "standard", known-working sage-dist spkgs if needed is > also a big advantage of this proposal. > > Any other questions? > > > 7. Future concepts > ================== > > a) Platform hooks > ----------------- > > It might be nice, when using system packages, for the underlying > OS/distribution system to hook into the SAGE_SPKG_CHECK_ system, both > to check if a package is installed, and to provide its version number. > For example, when building Sage on Debian, it might just hook into the > dpkg system to provide this information in a manner consistent with > the system. > > b) Abstract packages > -------------------- > > Returning to the question of dependencies that can be satisfied by > more than one package (e.g. BLAS, GMP), I think it would be nice to > have a generic way of handling such cases that's a little cleaner than > the current ad-hoc system. I would like a way of specifying an > "abstract" package (which might be named "blas", for example). > Installing an abstract package would mean installing the concrete > package selected to satisfy it, but it would also include a system for > switching between concrete implementations. So for example it would > be possible to have multiple BLAS implementations installed > simultaneously, and installing "blas" with the current selection might > just be a matter of updating some symlinks. > > I think this concept fits in well with the proposal for handling > system packages, but doesn't necessarily need to be handled > simultaneously with it. For now we can just maintain the special > cases I think... > > > 8. Conclusion (for now) > ======================= > > I've heard many valid concerns with going beyond sage-the-distribution > for building/running Sage. Sage's huge collection of dependencies can > lead to many fragilities: Version X of package Y might work with > dependency A, but completely break dependency B. And supporting > versions V, W, and X of package Y simultaneously is a lot of overhead > compared to always just using version Y of that package for Sage. > > I do personally have a preference, when it comes to writing software, > to supporting as wide a range of versions for my dependencies as is > feasible. For some dependencies the versions supported may, > necessarily, be very narrow. But for other cases there can be a lot > more room for flexibility. > > Regardless, I think this proposal maintains the current stability of > Sage by keeping the current preference for sage-the-distribution in > all cases by default. It also maintains the ability to use > custom-built versions of some of Sage dependencies. But I think this > will also provide more flexibility in experimenting with using > existing system packages in cases where that's sufficient, and avoid > Sage duplicating system packages unnecessarily. > > Best, > Erik > > > [1] https://trac.sagemath.org/ticket/14405 > [2] https://www.technovelty.org/tips/the-stamp-idiom-with-make.html > [3] https://groups.google.com/d/msg/sage-devel/8MJBe_qxWJ0/fTzOPVzDAAAJ > -- You received this message because you are subscribed to the Google Groups "sage-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at https://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.