On Wed Jan 14 05:12:07 EST 2009, nadiasver...@gmail.com wrote: > On Jan 12, 10:42 am, quans...@quanstro.net (erik quanstrom) wrote: > > > [...] Many architectures get register > > > windows wrong, but the Itanium has a variable-length register fill/ > > > spill engine that gets invoked automatically. Of course, you can > > > program the engine too. > > > > what's the advantage of this over the stanford style? > > I'm not sure what exactly you mean by that.
http://en.wikipedia.org/wiki/Register_window > ARM is fine, but itanium predicated instructions allow you to have a > great number of predicate registers. This isn't like cmov and friends > either. does this buy anything in practice? references? > > unless it's an 8- or 16-bit part, i don't see why anyone cares > > if the assembly is simplier. but since this is an epic part, > > the assembly is never simple. > > > I don't know why bit size matters. Anyway, making the assembly simpler > has a lot of benefits. A human has to write the stuff at some point. > When there are bugs, a human has to read it. It also simplifies code > generation by the compiler. bit size matters because little 8- and 16-bit parts are so constrained that one's best option is generally writing assembler. (hint on an 8 bit computer, addressable memory is 256 bytes.) for a 64-bit cpu, writing a substantial amount of code in assembler is a waste of time. here are less than 2k lines of 386 assembler in the kernel and libc, (1919 on my system). > It also simplifies code generation by the compiler. having to build parallel instructions is a hard enough problem, it delayed the introduction of the itanium by several years and it's the reason amd had a window to sneak amd64 through. > > how do you get around the fact that the parallelism > > is limited by the instruction set and the fact that one > > slow sub-instruction could stall the whole instruction? > > Parallelism isn't anymore limited by the instruction set on Itanium > than it is anywhere else. The processor has multiple issue units that > can crunch multiple instructions in parallel. Some units can execute > multiple instructions per cycle. okay, then. please explain why it helps to have an explictly parallel instruction set with a architechturally defined number of parallel slots? adding cores makes like much flexable and easier to understand. (and i don't need to recompile or write a new compiler.) > There is a massive difference. As the other poster pointed out, > closures are cool in and of themselves. what do they get me? dlls don't count. plan 9 doesn't have dynamic linking. > On x86 processors, you get 4 stacks. One for each privilege level. > You can change a stack anytime you want, but it requires either an > instruction to do so, or instruction patching by the loader. > Everything gets stuck there and there are very few restrictions about > what you do with stuff on the stack. i don't think anybody cares about the intricate details of who sets up the stack or how it is managed. from the user's perspective, one can have as many stacks as one wishes per user application with the thread(2) library. if that's hardware support for any number of stacks or not is not an interesting question. > Quite a bit. Having the processor scan the incoming instruction > stream to locate potential parallizations is ludicrous. It works fine > when the processor guesses correctly, but it is horrendously expensive > when the processor guesses wrong. Requiring that the processor scan > incoming instructions to suss out potential parallelizations also > means that much less die space for doing real work. i don't think explictly parallel vs. implicitly parallel is an question that can be answered without a reference in the real world. do you have any references telling me why i can never get epic-like performance out of a non-epic cpu, transistor for transistor? one could consider epic a layering violation. why do i have to care how many execution units the architecture defines? by the way, epic still does speculative execution, etc. so what does epic get me? http://en.wikipedia.org/wiki/Explicitly_parallel_instruction_computing i still fail to see how one could call instruction bundles "simple" at the assembly level. > IA64 got a bad rap because the first hardware implementations of IA64 > were less than stellar, and the compilers were harder to write than > expected. The Itanium-2 and modern compilers are actually quite > nice. almost any problem can be worked out in 10 million lines of code and 2 billion transistors. i'd really be suprised if itanium could compete with a regular x86 system for most tasks, since memory bandwidth is so important, the fastest fsb available for a itanium is 667mhz. that's many x86 generations ago. - erik