As a hobby project, I'm working on a mostly memory-safe architecture that is targeted at direct software emulation. The majority of its instructions have memory operands that are relative to the stack pointer. Calls and returns adjust the stack pointer, so I suppose one could say that the architecture has register windows. The reason for the stack-based approach is that an emulated register move would be a memory-to-memory move anyway. I believe this approach is similar to the Lua VM and other VMs which hare generally considered register-based instead of stack-based. Writing it by hand, it feels more register-based than stack-based, too, although the direct support for multiple return values makes it possible to use some Forth-like idioms.
Here's some example code (destination operand comes first): .proc fib (_long) (_long) # Argument/result register: %3 # return address register: %2 # local register: %1 # outgoing argument/return register: %0 .framesize 24 # in bytes, three registers excluding the incoming argument ldic %1, 2 jlels %3, %1, :0 # If the argument is less than 2, just return it. addlc %0, %3, -1 # Prepare argument for first recursive call. callp fib mv %1, %0 # Save result of first call. addlc %0, %3, -2 # Prepare argument for second recursive call. callp fib addlso %3, %1, %0 # Sum of results, with an overflow check. :0 ret 24 The call instruction increments the stack pointer by 24 bytes to create the new frame. This is how the argument becomes available as %3 in the callee. (The real assembler has a minimal register allocator and computes the frame size automatically, so that the change of the argument registers is hidden from the programmer new local registers are introduced.) I tried to create a GCC backend for this, by looking at the existing mmix backend (for the register windows) and the bpf backend (due to its verified nature) for inspiration. I did not get very far because it's my first GCC backend. I wonder if that's just my lack experience, or if the target just isn't a good fit for GCC. I found Hans-Peter Nilsson old description of the CRIS backend, and it it mentions that GCC is not a good starting point for accumulator machines with just one register (and my target doesn't even have that, not really). If advisable, I can redefine the target to make it more GCC-friendly, perhaps by introducing a register file separate from the stack. (Although this would make the emulator and verifier more complex.)