On December 31, 2023 9:53:26 PM GMT+02:00, Sergey Bugaev <buga...@gmail.com> wrote: >Hello, and happy holidays! > >Every now and then, I hear someone mention potential ports of gnumach >to new architectures. I think I have heard RISC-V and (64-bit?) ARM >mentioned somewhere recently as potential new port targets. Being >involved in the x86_64 port last spring was a really fun and >interesting experience, and I learned a lot; so I, for one, have >always thought doing more ports would be a great idea, and that I >would be glad to be a part of such an effort again. > >Among the architectures, AArch64 and RISC-V indeed seem most >attractive (not that I know much about either). Among those two, >RISC-V is certainly newer and more exciting, but Aarch64 is certainly >more widespread and established. (Wouldn't it be super cool if we >could run GNU/Hurd everywhere from tiny ARM boards, to Raspberry Pi's, >to common smartphones, to, now, ARM-based laptops desktops?) Also I >have had some experience with ARM in the past, so I knew a tiny bit of >ARM assembly. > >So I thought, what would it take to port the Hurd to AArch64, a >completely non-x86 architecture, one that I knew very little about? >There is no AArch64 gnumach (that I know of) yet, but I could try to >hack on glibc even without one, I'd only need some headers, right? >There's also no compiler toolchain, but those patches to add the >x86_64-gnu target looked pretty understandable, so — how hard could it >be? > >Well, I did more than think about it :) > >I read up on AArch64 registers / assembly / architecture / calling >convention, added the aarch64-gnu target to binutils and GCC, added >basic versions of mach/aarch64/ headers to gnumach (but no actual >code), and made a mostly complete port of glibc. I haven't spent much >effort on Hurd proper, but I have tried running the build, and the >core Hurd servers (ext2fs, proc, exec, auth) do get built. > >I will be posting the patches soon. For now, here's just a little teaser: > >glibc/build $ file libc.so elf/ld.so >libc.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 >(GNU/Linux), dynamically linked, interpreter /lib/ld-aarch64.so.1, for >GNU/Hurd 0.0.0, with debug_info, not stripped >elf/ld.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 >(SYSV), dynamically linked, with debug_info, not stripped > >hurd/build $ file ext2fs/ext2fs.static proc/proc >ext2fs/ext2fs.static: ELF 64-bit LSB executable, ARM aarch64, version >1 (GNU/Linux), statically linked, for GNU/Hurd 0.0.0, with debug_info, >not stripped >proc/proc: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), >dynamically linked, interpreter /lib/ld-aarch64.so.1, for GNU/Hurd >0.0.0, with debug_info, not stripped > >glibc/build $ aarch64-gnu-objdump --disassemble=__mig_get_reply_port libc.so >libc.so: file format elf64-littleaarch64 >Disassembly of section .plt: >Disassembly of section .text: >000000000002b8e0 <__mig_get_reply_port>: > 2b8e0: a9be7bfd stp x29, x30, [sp, #-32]! > 2b8e4: 910003fd mov x29, sp > 2b8e8: f9000bf3 str x19, [sp, #16] > 2b8ec: d53bd053 mrs x19, tpidr_el0 > 2b8f0: b85f8260 ldur w0, [x19, #-8] > 2b8f4: 34000080 cbz w0, 2b904 <__mig_get_reply_port+0x24> > 2b8f8: f9400bf3 ldr x19, [sp, #16] > 2b8fc: a8c27bfd ldp x29, x30, [sp], #32 > 2b900: d65f03c0 ret > 2b904: 97fffbef bl 2a8c0 <__mach_reply_port> > 2b908: b81f8260 stur w0, [x19, #-8] > 2b90c: f9400bf3 ldr x19, [sp, #16] > 2b910: a8c27bfd ldp x29, x30, [sp], #32 > 2b914: d65f03c0 ret > >So it compiles and links, but does it work? — well, we can't know >that, not until someone ports gnumach, right? > >Well actually we can :) I've done the same thing as last time, when >working on the x86_64 port: run a statically linked hello world >executable on Linux, under GDB, carefully skipping over and emulating >syscalls and RPCs. This did uncover a number of bugs, both in my port >of glibc and in how the toolchain was set up (the first issue was that >static-init.S was not even getting linked in, the second issue was >that static-init.S was crashing even prior to the _hurd_stack_setup >call, and so on). But, I fixed all of those, and got the test >executable working! — as in, successfully running all the glibc >initialization (no small feat; this includes TLS setup, hwcaps / >cpu-features, and ifuncs), reaching main (), successfully doing puts >(), and shutting down. So it totally works, and is only missing an >AArch64 gnumach to run on. > >The really unexpected part is how easy this actually was: it took me >like 3 days from "ok, guess I'm doing this, let's add a new target to >binutils and gcc" to glibc building successfully, and a couple more >days to get hello world to work (single-stepping under GDB is just >that time-consuming). Either I'm getting good at this..., or (perhaps >more realistically) maybe it was just easy all along, and it was my >inexperience with glibc internals that slowed me down the last time. >Also, we have worked out a lot of 64-bit issues with the x86_64 port, >so this is something I didn't have to deal with this time. > >Now to some of the more technical things: > >* The TLS implementation is basically complete and working. We're using > tpidr_el0 for the thread pointer (as can be seen in the listing above), > like GNU/Linux and unlike Windows (which uses x18, apparently) and > macOS (which uses tpidrro_el0). We're using "Variant I" layout, as > described in "ELF Handling for Thread-Local Storage", again same as > GNU/Linux, and unlike what we do on both x86 targets. This actually > ends up being simpler than what we had for x86! The other cool thing is > that we can do "msr tpidr_el0, x0" from userspace without any gnumach > involvement, so that part of the implementation is quite a bit simpler > too. > >* Conversely, while on x86 it is possible to perform "cpuid" and identify > CPU features entirely in user space, on AArch64 this requires access > to some EL1-only registers. On Linux and the BSDs, the kernel exposes > info about the CPU features via AT_HWCAP (and more recently, AT_HWCAP2) > auxval entries. Moreover, Linux allows userland to read some otherwise > EL1-only registers (notably for us, midr_el1) by catching the trap that > results from the EL0 code trying to do that, and emulating its effect. > Also, Linux exposes midr_el1 and revidr_el1 values through procfs. > > The Hurd does not use auxval, nor is gnumach involved in execve anyway. > So I thought the natural way to expose this info would be with an RPC, > and so in mach_aarch64.defs I have an aarch64_get_hwcaps routine that > returns the two hwcaps values (using the same bits as AT_HWCAP{,2}) and > the values of midr_el1/revidr_el1. This is hooked to init_cpu_features > in glibc, and used to initialize GLRO(dl_hwcap) / GLRO(dl_hwcap2) and > eventually to pick the appropriate ifunc implementations. > >* The page size (or rather, paging granularity) is notoriously not > necessarily 4096 on ARM, and the best practice is for userland not to > assume any specific page size and always query it dynamically. GNU Mach > will (probably) have to be built support for some specific page size, > but I've cleaned up a few places in glibc where things were relying on > a statically defined page size. > >* There are a number of hardware hardening features available on AArch64 > (PAC, BTI, MTE — why do people keep adding more and more workarounds, > including hardware ones, instead of rewriting software in a properly > memory-safe language...). Those are not really supported right now; all > of them would require some support form gnumach side; we'll probably > need new protection flags (VM_PROT_BTI, VM_PROT_MTE), for one thing. > > We would need to come up with a design for how we want these to work > Hurd-wide. For example I imagine it's the userland that will be > generating PAC keys (and settings them for a newly exec'ed task), since > gnumach does not contain the functionality to generate random values > (nor should it); but this leaves open question of what should happen to > early bootstrap tasks and whether they can start using PAC after > initial startup. > >* Unlike on x86, I believe it is not possible to fully restore execution > context (the values of all registers, including pc and cpsr) purely in > userland; one of the reasons for that being that we can apparently no > longer do a load from memory straight into pc, like it was possible in > previous ARM revisions. So the way sigreturn () works on Linux is of > course they have it as a syscall that takes a struct sigcontext, and > writes it over the saved thread state. Sounds familiar to you? — of > course, that's almost exactly like thread_set_state () in Mach-speak. > The difference being that thread_set_state () explicitly disallows you > to set the calling thread's state, which makes it impossible to use for > implementing sigreturn (). So I'm thinking we should lift that > restriction; there's no reason why thread_set_state () cannot be made > to work on the calling thread; it only requires some careful coding to > make sure the return register (%eax/%rax/x0) is *not* rewritten with > mach_msg_trap's return code, unlike normally. > > But other than that, I do have AArch64 versions of trampoline.c and > intr-msg.h (complete with SYSCALL_EXAMINE & MSG_EXAMINE). Whether they > work, we'll only learn once we have enough of the Hurd running to have > the proc server. > >Anyways, enjoy! As said, I will be posting the patches some time soon. >I of course don't expect to get any reviews during the holidays. And — >any volunteers for a gnumach port? :) Not me :) > >Sergey > >P.S. Believe it or not, this is not the announcement that I was going >to make at Joshua's Christmas party; I only started hacking on this >later, after that email exchange. That other thing is still to be >announced :) I'm impatient to hear that:) >
Hello Sergey, happy new year! I don't know how you can achieve something like that so quickly, but it's always a pleasure to hear something new from you.