Hi Eric,

Currently at a workshop in Leiden [1] we figured out one another possible 
use case for your proposal. Some people does develop PARI/GP in parallel of 
Sage. One simple way to have a testing environment would be to have:
 * a git repo for PARI/GP
 * a git repo for SAGE
 * telling SAGE to use the development version PARI/GP (wherever it is 
installed)

Though, it triggers one question: how one would relaunch the chain 
compilation due to a PARI/GP update? Would it be automatically handled by 
the Makefile? (the same question holds for system packages of course)

Best
Vincent

 [1] 
https://www.universiteitleiden.nl/en/events/2017/07/workshop-on-algorithms-in-number-theory-and-arithmetic-geometry/

Le vendredi 26 mai 2017 15:01:36 UTC+2, Erik Bray a écrit :
>
> Hi folks interested in Sage packaging, 
>
> Almost every time the topic comes up, I complain that it isn't easier 
> to use more system packages as both build- and run-time dependencies 
> of Sage.  I'd like to make some progress on actually doing something 
> about that, and I have some ideas, but I'd like to bounce them off 
> anyone who's interested first before just going off and doing it. 
>
> There is enough work involved in this that I believe it can and should 
> be broken up into a number of smaller tasks.  I would also like to 
> approach this in a way that works well and integrates with the 
> existing "sage-the-distribution" infrastructure.  I believe there are 
> advantages to being able to develop on Sage in the "normal" way we're 
> already used to, while also being able to take advantage of existing 
> system packages wherever possible. 
>
> So I'm just going to try to organize my existing thoughts on this and 
> see what anyone thinks.  Sorry if it's TL;DR, but I'm hoping that 
> having a detailed discussion about this will make it more likely that 
> something will actually be accomplished on it soon (because I think 
> the actual implementation, once decided on, is not terribly 
> difficult). 
>
> Note: In this message I'm using "package" loosely to refer to any 
> program, library, database, or other collection of files that is 
> distributed and installed as a self-contained unit.  It doesn't 
> necessarily relate to any particular "packaging system". 
>
>
> 1. Why? 
> ======= 
>
> The extent and scope to which Sage "vendors" its dependencies, in the 
> form of what some call "sage-the-distribution", is *not* particularly 
> normal in the open source world.  Vendoring *some* dependencies is not 
> unusual, but Sage does nearly all (even down the gcc, in certain 
> cases).  I've learned a lot of the history to this over the past year, 
> and agree that most of the time this has been done with good reasons. 
>
> For example, I can't think of any other software that forces me to 
> build its own copy of ncurses just to build/install it.  This was 
> added for good reasons [1], but not reasons that can't also resolved 
> in part by installing the appropriate system packages, or that might 
> not be resolved by now in system packages that depend on ncurses (i.e. 
> that should be built with ncurses support).  Point being, this issue 
> does not necessarily impact everyone, and building Sage's own ncurses 
> is overkill in that case.  It would be one thing if we were just 
> talking one or two packages (I didn't pick on ncurses for any deep 
> reason), but now multiply that by around 250 (give or take, depending 
> on how many dependencies are even available as system packages) and it 
> becomes real overhead to getting started *and* making progress with 
> Sage development. 
>
> I wouln't propose *removing* any existing spkgs that are still 
> relevant.  I think it's really useful that Sage has a list of 
> known-good pinned versions of its dependencies. Further, 
> "sage-the-distribution" makes it very easy to install those 
> dependencies in such a way that they can be used as build/runtime 
> dependencies by Sage without having to hunt the 'net for the right 
> source packages of the right versions of those dependencies, and 
> figure out how to configure and build them in a piecemeal fashion.  In 
> other words, even if we do expand the ability to use system packages 
> for Sage's dependencies, it's still very nice that it's easy with a 
> few commands to use the spkg if something goes wrong with the system 
> package.  It's also, of course, important for power users who wish to 
> compile some dependencies on their own--especially highly tuned 
> numerical libraries (but even those users usually only care about 
> being able to hand-configure a few dependencies, not most). 
>
> To summarize: being able to more aggressively rely on system packages 
> can save a lot of time and frustration during normal development of 
> Sage, and is also less jarring especially to new developers, of whom 
> we would like to attract more.  It should also decrease the time 
> required to regularly build binary distributions of Sage (e.g. for 
> Docker, Windows, and Linux distros). 
>
>
> 2. Overview of how Sage manages dependencies now (and what won't change) 
> ======================================================================== 
>
> For many of you this will be unnecessary review, but I want to discuss 
> a little about how dependencies are currently checked and installed in 
> Sage-the-distribution.  Doing so is helpful for me too, to make sure I 
> understand it clearly (and correct me if I have any 
> misunderstandings). 
>
> Sage-the-distribution uses *Make* itself (cleverly, IMO) to manage 
> dependencies insofar as making sure all dependencies are installed, 
> and that when a package changes all packages that depend (directly or 
> indirectly) on that package are rebuilt.  Make works on files and 
> timestamps, which does not translate directly to entire software 
> packages, so to track whether or not an spkg is up to date, Sage uses 
> the common "stamp pattern" for Make [2]--that is, when an spkg is 
> installed it writes a file that effectively "represents" completion of 
> the installation of that spkg for Make's purposes.  These stamp files 
> are the files typically stored under 
> $SAGE_LOCAL/var/lib/sage/installed/<spkg>-<version>.  This directory 
> is also known in some places as SAGE_SPKG_INST.  By including the 
> version number in the name we can also force rebuilds when an spkg's 
> version changes. 
>
> When one runs `make <spkg>` with just the spkg name, this is actually 
> a phony target with the path to the stamp file for that package (at 
> its current version) as the sole target.  So `make <spkg>` translates 
> to `make $SAGE_SPKG_INST/<spkg>-<version>` for the current version of 
> that spkg.  The associated rule is to run the sage-spkg command for 
> that package, which also takes care of writing the stamp file. 
> sage-spkg also writes some information into each stamp file in a 
> somewhat loose format that I don't believe is parsed anywhere. 
> However the *existence* of these files is used by the (somewhat 
> controversial, for downstream packagers) `is_package_installed()` 
> function.*  I'm actually going to propose later that we write and use 
> these stamp files (with some slight changes) even when installing 
> dependencies from a system package, so these files might be present 
> even in binary packages for Sage (though that might be up to 
> downstream packagers). 
>
> When Sage's `./configure` script generates the main Makefile for all 
> of Sage's dependencies, it loops over all the spkgs in build/pkgs/ and 
> creates two make targets for each spkg: the aforementioned phony 
> target consisting of just the package name, and the *real* target for 
> the stamp file.  It also creates a make variable named like 
> `$(inst_<spkg>)` (where <spkg> is just the package name, without the 
> version) referring to the full path of the stamp file for that 
> package.  Each spkg may list its build dependencies in its 
> build/pkgs/<spkg>/dependencies file, in the format that it will appear 
> in the Makefile as dependencies for the make target of that package. 
> For convenience's sake, the `dependencies` file just contains the 
> package names, but the `./configure` script converts this to the 
> appropriate `$(inst_<spkg>)` variables, so that the stamp files become 
> the real dependencies (part of how the "stamp pattern" normally 
> works). 
>
> When a package is upgraded (i.e. its version number changes) then the 
> Makefile is regenerated, but with the `$(inst_<spkg>)` for that 
> package pointing to a new stamp file, containing the new version 
> number.  Thus any dependents of that package will see this as an 
> outdated dependency, and get rebuilt after the upgraded package is 
> built.  When packages are rebuilt (even if their version didn't 
> change) their stamp files are touched, forcing further rebuilds of any 
> of their dependents and so on, in normal Make behavior. 
>
> As far as I can tell this has worked quite well for Sage--especially 
> as it also allows leveraging Make's parallel build features.  So I'm 
> proposing to keep this all pretty much as-is, with possibly only minor 
> tweaks in the details.  Instead, many more of the changes will be at 
> configure time. 
>
>
> * There is proposed work already mostly done to replace use of 
> is_package_installed() within the Sage library with a way to do 
> runtime feature checks: https://trac.sagemath.org/ticket/20382  Some 
> of this work *might* be redundant with what I want to propose, but can 
> also coexist with it, as it is currently designed for runtime use by 
> the Python code itself, and not during builds. 
>
>
> 3. Case study--examples already in Sage 
> ======================================= 
>
> Sage-the-distribution already has a few examples of "spkgs" in the 
> system that *may* use a system package, rather than building from 
> source.  As it is this is done in an ad-hoc manner that can be 
> surprising and/or misleading.  But I think it's useful to look at them 
> to see how this is done currently and if there's anything we can learn 
> from it. 
>
> a) Blas 
> ------- 
>
> There are two different BLAS implementation packages to choose from 
> currently in Sage: OpenBLAS and ATLAS.*  The selection can be made 
> currently at configure time with a --with-blas= flag which can take 
> either 'openblas' or 'atlas'.  The selection is used to write a 
> variable called `$(BLAS)` in the makefile that points to the stamp 
> file path for the actual BLAS implementation spkg selected.  Other 
> spkgs that have BLAS as a dependency list the `$(BLAS)` variable in 
> its dependencies, rather than writing "openblas" or "atlas" 
> explicitly. 
>
> When openblas is selected (now the default) the openblas spkg is 
> installed unconditionally. 
>
> However, when *atlas* is selected, there happens to be a mechanism for 
> using a system BLAS (why just with ATLAS I don't know--historical 
> reasons I guess).  In this case it still runs the spkg-install for 
> ATLAS like for any other spkg, but its spkg-install checks for a 
> special environment variable, `SAGE_ATLAS_LIB` (the only way to 
> control this behavior).  This invokes a search in standard locations 
> first for a "libatlas.so" (or equivalent) explicitly.  If that's not 
> found, it will happily take whatever it does find as long as there's 
> *some* "libblas.so" and "liblapack.so" found on the system.  It 
> doesn't do any feature checks or anything--it just takes what it 
> finds. 
>
> If it does find something resembling either ATLAS specifically, or a 
> generic BLAS/LAPACK, then it skips installing the actual spkg, but 
> still writes a stamp file indicating that "ATLAS" was installed, with 
> whatever version is in the package-version.txt for the spkg, which can 
> of course be misleading.  (It also writes pkgconfig .pc files in 
> $SAGE_LOCAL/lib for blas/cblas/lapack indicating which libs it found, 
> along with a "fake" version of "1.0".) 
>
> This, Sage will use these system libraries for all build and runtime 
> requirements of BLAS, and in my experience this has generally worked. 
>
> * There is another issue I would like to address--slightly orthogonal 
> to supporting system packages--of having a regular way to support 
> "abstract" packages that can have multiple alternative implementations 
> (another example being GMP/MPIR).  This has been talked about before, 
> such as in this recent thread [3].  I have some ideas about this that 
> integrate well with my ideas for system packages, but I will try to 
> save that for a separate message. 
>
>
> b) GCC 
> ------ 
>
> The GCC spkg is a bit of a different beast, since it is normally not 
> installed by default, and was only added to support cases where the 
> platform's GCC is broken or too old and has bugs that affect building 
> Sage or its dependencies. 
>
> Although Sage's `configure` script is responsible for determining 
> whether or not GCC should be installed (in contrast to hacks in 
> spkg-install like for ATLAS), there is no *flag* for `configure` (e.g. 
> --with-gcc or something like that) for controlling this.  Instead the 
> behavior is controlled solely by an environment variable 
> "SAGE_INSTALL_GCC" (this should probably be fixed, but we'll come to 
> that).  If the environment variable is set to "yes"/"no" then that 
> forces the gcc installation behavior one way or the other.  However, 
> if the environment variable is not set, then the configure script goes 
> through the necessary checks to see if the installed gcc is new 
> enough, and also if gfortran is installed, among others.  If GCC 
> installation is deemed necessary then it sets a flag indicating as 
> much, called `need_to_install_gcc=yes`. 
>
> This is used later (see next section) to set the `$(inst_gcc)` variable. 
>
> c) git 
> ------ 
>
> Sage actually includes an spkg for git, and installs it 
> unconditionally (there is currently no way to control this) if a 
> working 'git' is not found on the system.  This is one of the few 
> packages that just has a straightforward check for the system version 
> at configure time.  If a working git is not found (where 'working' 
> here just means `git --version` works) the script sets a variable 
> (similar to the gcc case) called `need_to_install_git=yes`. 
>
> (It also sets a similar variable for `need_to_install_yasm` on 
> x86-based systems.) 
>
> Later, while writing the main Makefile, the configure script loops 
> over all spkgs that *might* be installed and checks for a 
> `need_to_install_<spkg>` variable.  If not found, or not set to "no", 
> the script sets the `$(inst_<spkg>)` variable to point to the standard 
> stamp file for that package.  Otherwise it sets `$(inst_<spkg>)` to a 
> dummy file that always exists (this way any dependencies for that 
> package are still satisfied, but the spkg is never actually 
> built/installed). 
>
>
> 4. Package sources 
> ================== 
>
> One of the main changes I'm proposing is that stamp files for packages 
> will always be written to SAGE_SPKG_INST even for cases where the 
> system package is used, and the Sage spkg is not actually installed. 
>
> That is, I want to change the meaning of "spkg" to more broadly 
> represent "a dependency of Sage that *may* be included in 
> Sage-the-distribution". 
>
> To this end I want to define a concept of spkg "sources" (not to be 
> confused with source code).  Instead, these are sources from which the 
> spkg dependency can be satisfied.  Three possible sources I have in 
> mind (and I'm not sure that there would be any other): 
>
> a) sage-dist:  This is the current notion of an "spkg", where the 
> source tarball is downloaded from one of the Sage mirrors, unpacked 
> and installed to $SAGE_LOCAL using sage-spkg + the spkg's spkg-install 
> script.  The resulting stamp file, with the version taken from 
> package-version.txt is written to $SAGE_SPKG_INST. 
>
> b) system: In this case a check is made to see if the dependency is 
> already satisfied by the system. How exactly this check is performed 
> depends heavily on the package.  *If possible* the version of the 
> system package is also determined (will discuss the nuts-and-bolts of 
> this later).  In this case a stamp file is still written to 
> $SAGE_SPKG_INST, but indicating somehow that the system package was 
> used, not the sage-dist package. 
>
> c) source: This case is not necessary for supporting system packages, 
> but I think would be useful for testing new versions of a package.  In 
> this case it would be possible to install an spkg from an existing 
> source tree for that package, which would be installed using the 
> spkg-install script.  If possible the version number would be 
> determined from the package source code, and not assumed.  I think 
> this would be useful, but won't discuss this case any further for now. 
> I just point it out as another possibility within this framework of 
> allowing different spkg "sources". 
>
> To summarize, no matter how an spkg dependency is satisfied, a stamp 
> file for that spkg is written to $SAGE_SPKG_INSTALL, possibly 
> indicating the *actual* version of the package being used by Sage, and 
> indicating how the dependency was satisfied. 
>
>
> 5. Nuts and bolts 
> ================= 
>
> a) New stamp file format 
> ------------------------ 
>
> As suggested in the previous section, no matter how an spkg dependency 
> was satisfied, a stamp file is written to the $SAGE_SPKG_INST 
> directory.  In order to support multiple possible package "sources", 
> the source that was used should be included in the stamp file.  This 
> way, it will also be possible to re-run `./configure` and specify a 
> different source for a package, thus forcing a rebuild.  So I think 
> the stamp filename format should be something like: 
>
>     $SAGE_SPKG_INST/<name>-<source>-<version> 
>
> where <name> would be the base package name, <source> would be 
> something like "sagedist" or "system", and <version> the *actual* 
> version of the package being used.  I'll discuss in the next section 
> how this might be determined for system packages.  There's plenty of 
> room for bikeshedding in this, but I think this makes sense.  We could 
> also support the old filename format, if such files are found, for 
> backwards compatibility. 
>
>
> b) Checking packages 
> -------------------- 
>
> For any dependency that may be satisfied by system packages, there 
> needs to be a way to specify what the minimum dependency is for Sage 
> (be it a version number, or the presence of certain features) there 
> needs to be a way for each package to check that the dependency is 
> satisfied. 
>
> I've gone back and forth on exactly how this should be done, but I 
> think that the best way to do this is to allow per-package m4 files, 
> containing an m4 macro that checks that dependency on that package is 
> satisfied (again, be it version number or some other check).  Each 
> macro could be named something like 
>
>     SAGE_SPKG_CHECK_<name> 
>
> Optionally the macro should set a variable indicating the package 
> *version* if the package dependency is satisfied.  This is the version 
> string that can be used in the stamp file, for example.  If there is 
> no clear way to determine the version (though it most cases there will 
> be), a string like "unknown" could still be allowed for the version. 
> The macro would be defined in a file like sage_spkg_check.m4 under 
> each build/pkgs/<spkg> directory, and loaded on an as-needed basis 
> using the m4_include command in configure.ac. 
>
> Writing an m4 macro for autoconf is not a common skill, which is why 
> I've hesitated on this.  But I think it has a few justifications: It 
> allows one to take advantage of the many existing macros that come 
> with autoconf to perform common checks, such as whether a program is 
> installed, or a function is available in a library.  For many packages 
> the SAGE_SPKG_CHECK_ macro would probably just wrap one or two 
> existing autoconf macros.  Another justification is that for some 
> packages there may be existing macros to check for them that we can 
> borrow from other projects. 
>
> We can also provide, in the documentation, a simple template macro 
> demonstrating how to wrap a few shell commands. 
>
> *NOTE*: To be clear, I'm not proposing that, to implement this 
> proposal, we go through and write 250+ m4 macros for every Sage spkg. 
> This check will be optional, and we can write them one at a time on an 
> as-needed basis, starting with some of the most important ones.  I'll 
> discuss more about how missing checks are handled in the next section. 
>
> Obviously the packages that already have checks in configure.ac (gcc, 
> git, yasm) would have those checks moved out to their package-specific 
> macros. 
>
>
> c) Driving the system 
> --------------------- 
>
> As previously noted, selecting the source for a package would be done 
> at ./configure time.  My proposal would be to change very little about 
> the current default behavior. 
>
> By default, all packages would be installed from the sage-dist source 
> as is the case now.  We could still make exceptions for build 
> dependencies like gcc and git.  I don't care whether these exceptions 
> are hard-coded in configure.ac, or specified in some generic way. 
>
> However, the configure script would support, for all spkgs, a 
> `--with-system-<spkg>` argument (e.g. `--with-system-zlib`). 
>
> For each spkg to be installed (all standard packages, optional 
> packages if selected), if the `--with-system-<spkg>` argument is 
> given, it will attempt to load and run the SAGE_SPKG_CHECK_<spkg> 
> macro for that package.  If the macro is not defined, there would be a 
> *warning* that system package was selected for that package, but there 
> is no way to check if it was installed.  The warning would make clear 
> that if the build fails it may be due to this dependency being 
> missing.  Otherwise it runs the check, and if the check succeeds the 
> configure script would continue, while if the check fails the 
> configure would stop with an error. 
>
> Optionally, we could add arguments to control all of this behavior. 
> For example, it might be useful to have an option to install the 
> sage-dist spkg if a check is not defined.  This might even be better 
> as the default--a possible bikeshed issue. 
>
> Another possible option is one that enables system packages, but 
> disables any checks.  This might be useful for system packagers who 
> already have external guarantees that the dependencies have been met. 
>
> Finally, there should be an option like `--with-system-all` to 
> automatically use system packages for all dependencies, so that 
> downstream packagers don't have to supply hundreds of `--with-system-` 
> flags. 
>
> Otherwise, generation of the build/make/Makefile by the configure 
> script would proceed more or less as it does currently.  It would just 
> take into account information gained through any `--with-system-` 
> flags to generate the new format stamp filenames.  The .dummy stamp 
> file would not be used anymore.  Also, the rule for building system 
> packages would be to simply write the stamp file. 
>
>
> 6. Q&A 
> ===== 
>
> Q: What if I install with --with-system-<spkg> but later want to 
> install the sage-dist version of that package? 
>
> A: We should also support some way to deselect system packages. 
> Perhaps --without-system-<spkg> / --with-system-<spkg>=no (these are 
> two ways of saying the same things in standard configure scripts). 
>
> Q: The reverse: What if I install the sage-dist package, but want to 
> switch to the system package? 
>
> A: Same thing, but this is a little trickier because we would need to 
> *uninstall* the package from $SAGE_LOCAL.  I have a proposal for 
> improving spkg uninstallation written up at 
> https://trac.sagemath.org/ticket/22510 
>
> Q: What if I use a system package when building Sage, but that package 
> is later upgraded, or worse, removed? 
>
> A: There's no great solution to this.  Certainly, I think the 
> ./configure time checks should be cached (since updates are not 
> usually *that* frequent).  So there needs to be good documentation on 
> invalidating the cache when re-running ./configure.  Still, that only 
> helps with configure-time detection.  Sage can still break at runtime 
> if a system package it depends on changes.  This is a generic problem 
> for *any* software development, however, and something developers 
> should be aware if if they're updating their system.  Granted, most 
> people don't always closely examine what's changing when they install, 
> for example, OS updates.  I certainly don't always check this with a 
> fine-toothed comb.  But it's a general issue.  Keeping the ability to 
> install the "standard", known-working sage-dist spkgs if needed is 
> also a big advantage of this proposal. 
>
> Any other questions? 
>
>
> 7. Future concepts 
> ================== 
>
> a) Platform hooks 
> ----------------- 
>
> It might be nice, when using system packages, for the underlying 
> OS/distribution system to hook into the SAGE_SPKG_CHECK_ system, both 
> to check if a package is installed, and to provide its version number. 
> For example, when building Sage on Debian, it might just hook into the 
> dpkg system to provide this information in a manner consistent with 
> the system. 
>
> b) Abstract packages 
> -------------------- 
>
> Returning to the question of dependencies that can be satisfied by 
> more than one package (e.g. BLAS, GMP), I think it would be nice to 
> have a generic way of handling such cases that's a little cleaner than 
> the current ad-hoc system.  I would like a way of specifying an 
> "abstract" package (which might be named "blas", for example). 
> Installing an abstract package would mean installing the concrete 
> package selected to satisfy it, but it would also include a system for 
> switching between concrete implementations.  So for example it would 
> be possible to have multiple BLAS implementations installed 
> simultaneously, and installing "blas" with the current selection might 
> just be a matter of updating some symlinks. 
>
> I think this concept fits in well with the proposal for handling 
> system packages, but doesn't necessarily need to be handled 
> simultaneously with it.  For now we can just maintain the special 
> cases I think... 
>
>
> 8. Conclusion (for now) 
> ======================= 
>
> I've heard many valid concerns with going beyond sage-the-distribution 
> for building/running Sage.  Sage's huge collection of dependencies can 
> lead to many fragilities: Version X of package Y might work with 
> dependency A, but completely break dependency B.  And supporting 
> versions V, W, and X of package Y simultaneously is a lot of overhead 
> compared to always just using version Y of that package for Sage. 
>
> I do personally have a preference, when it comes to writing software, 
> to supporting as wide a range of versions for my dependencies as is 
> feasible.  For some dependencies the versions supported may, 
> necessarily, be very narrow.  But for other cases there can be a lot 
> more room for flexibility. 
>
> Regardless, I think this proposal maintains the current stability of 
> Sage by keeping the current preference for sage-the-distribution in 
> all cases by default.  It also maintains the ability to use 
> custom-built versions of some of Sage dependencies.  But I think this 
> will also provide more flexibility in experimenting with using 
> existing system packages in cases where that's sufficient, and avoid 
> Sage duplicating system packages unnecessarily. 
>
> Best, 
> Erik 
>
>
> [1] https://trac.sagemath.org/ticket/14405 
> [2] https://www.technovelty.org/tips/the-stamp-idiom-with-make.html 
> [3] https://groups.google.com/d/msg/sage-devel/8MJBe_qxWJ0/fTzOPVzDAAAJ 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Reply via email to