John Nagle <na...@animats.com> writes: > In the superscalar era, there's not much of an advantage to avoiding >stack accesses.
Apart from 4stack, I am not aware of a superscalar stack machine (and 4stack is more of an LIW than a superscalar). OTOH, if by stack accesses you mean memory accesses through the stack pointer on a register machine, then evidence contradicts your claim. E.g., if we can keep one or two more of Gforth's VM's registers in real registers rather than on the stack of an IA32 CPU, we see significant speedups (like a factor of 2). >x86 superscalar machines have many registers not >visible to the program, as the fastest level of cache. They have a data cache for memory accesses (about 3 cycles load-to-use latency on current CPUs for these architectures), and they have rename registers (not visible to programmers) that don't cache memory. They also have a store buffer with store-to-load forwarding, but that still has no better load-to-use latency. >In practice, >the top of the stack is usually in CPU registers. Only if the Forth system is written that way. > The "huge number >of programmer-visible register" machines like SPARCs turned out to be >a dead end. Really? Architectures with 32 programmer-visible registers like SPARC (but, unlike SPARC, without register windows) are quite successful in embedded systems (e.g., MIPS, SPARC). >So did making all the instructions the same width; it >makes the CPU simpler, but not faster, and it bulks up the program >by 2x or so. In the beginning it also made the CPU faster. As for the bulk, here's some data from <2007dec11.202...@mips.complang.tuwien.ac.at>; it's the text (code) size of /usr/bin/dpkg in a specific version of the dpkg package: .text section 98132 dpkg_1.14.12_hurd-i386.deb 230024 dpkg_1.14.12_m68k.deb 249572 dpkg_1.14.12_amd64.deb 254984 dpkg_1.14.12_arm.deb 263596 dpkg_1.14.12_i386.deb 271832 dpkg_1.14.12_s390.deb 277576 dpkg_1.14.12_sparc.deb 295124 dpkg_1.14.12_hppa.deb 320032 dpkg_1.14.12_powerpc.deb 351968 dpkg_1.14.12_alpha.deb 361872 dpkg_1.14.12_mipsel.deb 371584 dpkg_1.14.12_mips.deb 615200 dpkg_1.14.12_ia64.deb Sticking with the Linux packages (i.e., not the Hurd one), the range in code size increase over the i386 code is 0.97 (ARM) to 1.41 (MIPS) for the classical architectures with fixed-size instructions (RISCs). Only the IA64 has a code size increase by a factor of 2.33. Note that code size is not everything that's in a program binary, and the rest should be unaffected by whether the instructions are fixed-size or variable-sized, so the overall effect on the binary will be smaller. - anton -- M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html New standard: http://www.forth200x.org/forth200x.html EuroForth 2010: http://www.euroforth.org/ef10/ -- http://mail.python.org/mailman/listinfo/python-list