On Wed Jan 14 05:12:07 EST 2009, nadiasver...@gmail.com wrote:
> On Jan 12, 10:42 am, quans...@quanstro.net (erik quanstrom) wrote:
> > > [...] Many architectures get register
> > > windows wrong, but the Itanium has a variable-length register fill/
> > > spill engine that gets invoked automatically.  Of course, you can
> > > program the engine too.
> >
> > what's the advantage of this over the stanford style?
> 
> I'm not sure what exactly you mean by that.

http://en.wikipedia.org/wiki/Register_window

> ARM is fine, but itanium predicated instructions allow you to have a
> great number of predicate registers.  This isn't like cmov and friends
> either.

does this buy anything in practice?  references?

> > unless it's an 8- or 16-bit part, i don't see why anyone cares
> > if the assembly is simplier.  but since this is an epic part,
> > the assembly is never simple.
> 
> 
> I don't know why bit size matters. Anyway, making the assembly simpler
> has a lot of benefits.  A human has to write the stuff at some point.
> When there are bugs, a human has to read it.  It also simplifies code
> generation by the compiler.

bit size matters because little 8- and 16-bit parts are so
constrained that one's best option is generally writing
assembler. (hint on an 8 bit computer, addressable
memory is 256 bytes.) for a 64-bit cpu, writing a substantial amount
of code in assembler is a waste of time.  here are
less than 2k lines of 386 assembler in the kernel and libc,
(1919 on my system).

>   It also simplifies code generation by the compiler.

having to build parallel instructions is a hard enough
problem, it delayed the introduction of the itanium by several
years and it's the reason amd had a window to sneak amd64
through.

> > how do you get around the fact that the parallelism
> > is limited by the instruction set and the fact that one
> > slow sub-instruction could stall the whole instruction?
> 
> Parallelism isn't anymore limited by the instruction set on Itanium
> than it is anywhere else.  The processor has multiple issue units that
> can crunch multiple instructions in parallel.  Some units can execute
> multiple instructions per cycle.

okay, then.  please explain why it helps to have an explictly
parallel instruction set with a architechturally defined number of
parallel slots?  adding cores makes like much flexable and easier
to understand.  (and i don't need to recompile or write a new compiler.)

> There is a massive difference.  As the other poster pointed out,
> closures are cool in and of themselves.

what do they get me?  dlls don't count.  plan 9 doesn't have dynamic
linking.

> On x86 processors, you get 4 stacks.  One for each privilege level.
> You can change a stack anytime you want, but it requires either an
> instruction to do so, or instruction patching by the loader.
> Everything gets stuck there and there are very few restrictions about
> what you do with stuff on the stack.

i don't think anybody cares about the intricate details
of who sets up the stack or how it is managed.  from the user's
perspective, one can have as many stacks as one wishes per user
application with the thread(2) library.

if that's hardware support for any number of stacks or not
is not an interesting question.

> Quite a bit.  Having the processor scan the incoming instruction
> stream to locate potential parallizations is ludicrous.  It works fine
> when the processor guesses correctly, but it is horrendously expensive
> when the processor guesses wrong.  Requiring that the processor scan
> incoming instructions to suss out potential parallelizations also
> means that much less die space for doing real work.  

i don't think explictly parallel vs. implicitly parallel is an
question that can be answered without a reference in the
real world.  do you have any references telling me why i
can never get epic-like performance out of a non-epic
cpu, transistor for transistor?

one could consider epic a layering violation.  why do i have
to care how many execution units the architecture defines?

by the way, epic still does speculative execution, etc.
so what does epic get me?

http://en.wikipedia.org/wiki/Explicitly_parallel_instruction_computing

i still fail to see how one could call instruction bundles
"simple" at the assembly level.

> IA64 got a bad rap because the first hardware implementations of IA64
> were less than stellar, and the compilers were harder to write than
> expected.  The Itanium-2 and modern compilers are actually quite
> nice.

almost any problem can be worked out in 10 million lines of code
and 2 billion transistors.

i'd really be suprised if itanium could compete with a regular
x86 system for most tasks, since memory bandwidth is so important,
the fastest fsb available for a itanium is 667mhz.  that's
many x86 generations ago.

- erik

Reply via email to