Hi, On Mon, 03 Apr 2023 at 20:41, Spencer Skylar Chan <scha...@terpmail.umd.edu> wrote:
>> I would expect most software versions to not be in Guix. Simon had >> mentioned that this is mostly what the guix-past repository is >> for. However, some packages might be buried on some branch or some >> commit in some Guix related git repository. It may be helpful to >> facilitate their discovery and extraction for conda import. Please note, 1. The aim of the guix-past [1] channel is to have previous versions of some packages still working with recent Guix revisions. The motivation of guix-past had been the 10 Years Challenge [2] and then fed by hackathon [3]. 2. There is no easy way to know which revision of Guix provides that specific version of this package. The discovery of package version mapping Guix revision is not straightforward with the current tool. I am aware of two directions: rely on external server as the Guix Data Service [4] or implement “guix git log” [5] (the code lives in the branch ’wip-guix-log’). 1: https://gitlab.inria.fr/guix-hpc/guix-past 2: http://rescience.github.io/ten-years/ 3: https://hpc.guix.info/blog/2020/07/reproducible-research-hackathon-experience-report/ 4: https://data.guix.gnu.org/repository/1/branch/master/package/gmsh/output-history 5: https://guix.gnu.org/en/blog/2021/outreachy-guix-git-log-internship-wrap-up/ >> Git has a newish binary file format for caching searches across >> commits. Maybe it would be helpful to figure out how to parse this >> format (its documented) and index the data further using Xapian or a >> graph data structure (or tree sitter?) with the relevant metadata >> needed to find and efficiently extract scheme code and its >> dependencies? Months ago, I have started to do that: index the package list using Xapian. Well, started is a strong word here, since I have not done much. My idea was (is still!) an attempt to address to two in the same time: faster “guix search” [6] and discovery the past versions. Somehow rework Arun’s patches [6]. From my point of view, it would be possible to add Xapian as a dependency for Guix, therefore I think it should use GUIX_EXTENSIONS_PATH. 6: https://issues.guix.gnu.org/39258#14 > If the format is documented then this is possible, although I'm not > super familiar with these kinds of data structures. As said, an entry point about how “guix search” works is the super long discussion in #39258 [7]. :-) 7: https://issues.guix.gnu.org/39258 >> You make an interesting point about compilation errors. It may more >> productive to help researchers test for working satisfiable >> configurations as a more relaxed approach to having to specify the >> exact software version. Maybe some "nearby" or newer version is >> packaged and that is enough to successfully run a test suite? I'm >> imagining something between git bisect and Guix's own package >> solver. > > Yes, we could have a variant of the solver that's more relaxed. It could > output multiple solutions so the user can inspect them and pick the best > one. I do not know what you have in mind with “working satisfiable configurations” or with “a variant of the solver”. To my knowledge, this implies some SAT solver. Well, before going this direction, I would suggest to read some output of the Mancoosi project [8]. Especially this part [9]. From my point of view, the direction “working satisfiable configurations” or “a variant of the solver” would break the reproducibility of a specific configuration for the general case. Part of the problem about computational environment reproducibility is because package manager implements solvers for installing some packages. That’s said, all the package versions that Guix can provide is some DAG because it is a Git history – well, it is the combination of several Git histories when considering several channels. Thus, a specific version for a package is given by an interval in the graph. Considering a list of packages at one specific version, we end with a list of intervals. The “working satisfiable configuration” is then the intersection of all the intervals of this list; note that the resulting output could also be the empty interval. It’s a problem of graph. Almost trivial when the graph is linear. But it requires some work when merge happens. And note that the merges merge some branches that does not always fully build; for instance part of core-updates before its merges. To my knowledge, it is impossible to detect beforehand. We discussed these kind of topics when introducing “guix package --export-channels”; it is a variant of this proposal, IMHO. Last, considering all Guix the version fields, I am not convinced it is straightforward to guarantee some “nearby” or newer versions. It can only be heuristics working with more or less accuracy; see “guix refresh” and all the updaters. All in all, I am not convinced Guix should try to implement a way to “specify the exact software version”. Because it leads to false considerations that label versions are enough for reproducing computational environments, when it is far to be. Well, I agree that Guix should only provide tools to build channels.scm and manifest.scm files, both hinted by some inputs as requirements.txt. And strongly claiming that only the resulting computational environment generated by channels.scm+manifest.scm is reproducible. All other computational environments generated with inputs other than channels.scm+manifest.scm is not reproducible – this includes any converter from whatever inputs to generated channels.scm+manifest.scm. 8: https://www.mancoosi.org/ 9: https://www.mancoosi.org/edos/algorithmic/ > Finally, would these projects be considered large or medium for the > purposes of GSOC? Well, there is many ideas floating around. :-) That’s because many work still remain. ;-) Many ideas discussed here are larger than GSoC. Now, you should pick one that interests you and where you have an idea for implementing it. Then try to draw a schedule to see if you think it would fit. Please consider that implementing always takes longer than initially planned – there is always unexpected tiny details that are blocking the initial plan; devil, details and all that. ;-) Cheers, simon