Hello :) I've been thinking how to improve the early Hurd server bootstrap. To be clear, this is not about `bootstrap' as in sysvinit, but it is about how to start the very first Hurd servers.
Now we all know that system bootstrap is an emotional subject, so I'd like to address any concerns you might have about my proposal. Also, I'd love to get you excited about my approach, because it empowers developers to hack the bootstrap procedure and users to rescue their systems. Currently, this process is fragile. The Hurd server bootstrap is always the first thing that breaks (which isn't really surprising, since it is the first thing that runs). But what's much much worse is that the process is not hackable, and it is not interactive. What does the current bootstrap look like ? The bootstrap process is implemented in the bootscript parser (either in gnumach, or in boot), libdiskfs, exec, startup, proc, auth. All of these servers perform some tasks, start or resume other servers, plug ports into each other, and often just wait for another server. Why is it so fragile ? If any of the servers break, the whole process most often just hangs, displaying nothing, leaving the user puzzled and incapable of inspecting and fixing her system. And why is it not very hackable ? This whole mechanism is implemented across these components. You cannot easily extend or modify it without being aware of the whole bootstrap procedure. You cannot omit any of the components. The process is so inflexible, that we hard-coded the individual components names into RPCs (startup_procinit, startup_authinit). How do we fix it ? I propose to create a self-contained shell that executes a boot script. We expose enough Mach/Hurd functionality to implement the whole bootstrap procedure, e. g. (task-set-bootstrap-port, make-send-right, task-resume, ...). If anything goes wrong, we drop the user into an interactive repl. Wait, that sounds like serverboot, doesn't it ? It does. I think it was a mistake to abandon it in the first place. Evidence for that: 1. We now have two copies of the bootscript parser, once in the kernel, once in boot. 2. We moved non-essential functionality (with lot's of string parsing) into the kernel. 3. It makes strong assumptions about the platform, e.g. assumes a bootloader that can load modules like GRUB does with the multiboot protocol. But we don't use some made-up shell-like language, we use a Scheme interpreter. \o/ Why is it awesome ? (define (bootstrap) (log "Hurd server bootstrap:") (bind-root rootfs-task) [...] (log ".\n")) (catch (panic "Hurd bootstrap failed: " last-exception "\n") (bootstrap)) Imagine the possibilities. Say you're creating a live-cd. You load a statically-linked isofs translator, you locate the cd, point the isofs to the device and resume it, load and run a exec server from the cd, mach-defpager, start a tmpfs as root filesystem, populate it from the cd, start other essential translators, hand of to sysvinit. Here is a screenshot from my 3-nights prototype: ~~~ snip ~~~ Loading kernel... Loading bootshell... Loading boot script... Loading hello demo... Loading root filesystem... GNU Mach 1.4 ELF section header table at c00102ec [...] module 0: bootshell ${host-port} ${device-port} ${bootscript-task} ${hello-task} ${rootfs-task} $(task-create) $(task-resume) module 1: runsystem.scm $(bootscript-task=task-create) module 2: hello $(hello-task=task-create) module 3: iso9660fs --host-priv-port=${host-port} --device-master-port=${device-port} --exec-server-task=${hello-task} -T typed device:hd2 $(rootfs-task=task-create) 4 multiboot modules task loaded: bootshell 1 2 3 4 5 reading 854 bytes into map c82eaf40 task loaded: runsystem.scm task loaded: hello task loaded: iso9660fs --host-priv-port=1 --device-master-port=2 --exec-server-task=3 -T typed device:hd2 start bootshell: bootshell/TinySCHEME 1.41. Hurd server bootstrap: rootfs hello. Hello world. catch_exception_raise (23, 4, 1, 2, 36): terminating task 4. We managed to rendezvous with the root filesystem. Feel free to roam around with `(cd "boot")'. You can also try `(cat "/etc/hostname")', or `(load "more.scm")'. Note that we did not rely on the Hurd bootstrap code build into the rootfs translator. The Hurd server bootstrap you saw above is implemented in scheme. You can inspect it right now by doing `(cat "/boot/runsystem.scm")'. It is so tiny that it fits this screen. You can remaster this image with `grub-mkrescue'. See `remaster.txt'. You can see how it handles failures by selecting the wrong menu entry in GRUB. Welcome to bootshell, a scheme shell. Type `(help)' for help. Sorry for the lack of -lreadline. runsystem@nonmonolithic / > (cd "etc") ==> () runsystem@nonmonolithic /etc > (cat "hostname") nonmonolithic ==> #t runsystem@nonmonolithic /etc > ~~~ snap ~~~ Note how we started the rootfs translator on its own, and we can use it, e.g. to load the systems hostname from "/etc/hostname" to display it in the prompt. Also see how it handles failures: ~~~ snip ~~~ start bootshell: bootshell/TinySCHEME 1.41. Hurd server bootstrap: rootfsiso9660fs: device:hd0: No such device or address catch_exception_raise (20, 5, 1, 2, 36): terminating task 5. panic: Hurd bootstrap failed: (timeout) (emergency-shell) > ((lambda (x) (+ 1 x)) 3) ==> 4 (emergency-shell) > ~~~ snap ~~~ That's right. Your bootstrap failed but you still got a shell to interact with your system. Awesome, can I play with it ? Sure! Grab http://darnassus.sceen.net/~teythoon/bootshell24.iso and do `kvm -k en-us -cdrom bootshell24.iso'. You can easily remaster it to mess with the boot script. The code is here: http://darnassus.sceen.net/gitweb/teythoon/hurd.git/shortlog/refs/heads/bootshell23 It requires only tiny tweaks to libdiskfs, and a tiny patch to gnumach that enables us to load non-elf files into tasks. Love to get your input, Justus