Hello, I’m trying to gather a “wish list” of things to be done to facilitate the use of Guix on clusters and for high-performance computing (HPC).
Ricardo and I wrote about the advantages, shortcomings, and perspectives before: http://elephly.net/posts/2015-04-17-gnu-guix.html https://hal.inria.fr/hal-01161771/en I know that Pjotr, Roel, Ben, Eric and maybe others also have experience and ideas on what should be done (and maybe even code? :-)). So I’ve come up with an initial list of work items going from the immediate needs to crazy ideas (batch scheduler integration!) that hopefully make sense to cluster/HPC people. I’d be happy to get feedback, suggestions, etc. from whoever is interested! (The reason I’m asking is that I’m considering submitting a proposal at Inria to work on some of these things.) TIA! :-) Ludo’.
- non-root usage + file system virtualization needed * map ~/.local/gnu/store to /gnu/store * user name spaces? * [[https://github.com/proot-me/PRoot/][PRoot]]? but performance problems? * common interface, like “guix enter” spawns a shell where /gnu/store is available + daemon functionality as a library * client no longer connects to the daemon, does everything locally, including direct store accesses * can use substitutes + or plain ’guix-daemon --disable-root’? + see [[http://lists.gnu.org/archive/html/help-guix/2016-06/msg00079.html][discussion with Ben Woodcroft and Roel]] - central daemon usage (like at MDC, but improved) + describe/define appropriate setup, like: * daemon runs on front-end node * clients can connect to daemon from compute nodes, and perform any operation * use of distributed file systems: anything to pay attention to? * how should the front-end offload to compute nodes? + technical issues * daemon needs to be able to listen for connections elsewhere * client needs to be able to [[http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20381][connect remotely]] instead of using [[http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20381#5][‘socat’ hack]] * how do we share localstatedir? how do we share /gnu/store? * how do we share the profile directory? + admin/social issues * daemon runs as root * daemon needs Internet access * Ricardo mentions lack of nscd and problems caused by the use of NSS plugins like [[https://fedoraproject.org/wiki/Features/SSSD][SSSD]] in this context + batch scheduler integration? * allow users to offload right from their machine to the cluster? - package variants, experimentation + for experiments, as in Section 4.2 of [[https://hal.inria.fr/hal-01161771/en][the RepPar paper]] * in the meantime we added [[https://www.gnu.org/software/guix/manual/html_node/Package-Transformation-Options.html][--with-input et al.]]; need more? + for [[https://lists.gnu.org/archive/html/guix-devel/2016-10/msg00005.html][CPU-specific optimizations]] + somehow support -mtune=native (and even profile-guided optimizations?) + simplify the API to switch compilers, libcs, etc. - workflow, reproducible science + implement [[http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22629][channels]] + provide a way to see which Guix commit is used, like “guix channel describe” + simple ways to [[https://lists.gnu.org/archive/html/guix-devel/2016-10/msg00701.html][test the dependents of a package]] (see also discussion between E. Agullo & A. Enge) * new transformation options: --with-graft, --with-source recursive + support [[https://lists.gnu.org/archive/html/guix-devel/2016-05/msg00380.html][workflows and pipelines]]? + add [[https://github.com/galaxyproject/galaxy/issues/2778][Guix support in Galaxy]]?