Re: [perf-discuss] Project proposal:"Solaris EnhancementsforAMD-based Platforms"

Roland Mainz Fri, 02 Nov 2007 15:24:42 -0800

"Ostrovsky, Boris" wrote:
> > On Nov 1, 2007 5:33 PM, Roland Mainz <[EMAIL PROTECTED]> wrote:
> > > I was referring to something I read recently in a german computer
> > > magazine tha the upcoming AMD CPUs have some kind of special 128 FP
> > > instruction stuff (disclimer: I have no clue what exactly te stuff
> was)
> >
> > It may be referring to the SSE5 instruction set...
> 
> This is a brief description of SSE5 instruction set.
> 
> http://developer.amd.com/sse5.jsp


Thanks! :-)

> > BTW, Sun engineers continue to add new features in the compiler for
> AMD:
> > http://blogs.sun.com/tatkar/entry/sun_studio_patch_supports_barcelona
> >
> > > ...
> > > ... if this stuff includes some kind of special instructions it may
> be
> > > nice to reflect this via matching flags in the $ /usr/bin/isalist #
> > > output.
> 
> That's a good idea. isalist doesn't report any of SSE* now. getisax()
> does.

Umpf... the problem is that something like |getisax()| isn't used by
"isaexec" (/usr/lib/isaexec is a tool which acts as a "switch" which
"redirects" requests like /usr/bin/ksh93 (which is a hardlink to
/usr/lib/isaexec) to either /usr/bin/amd64/ksh93 or /usr/bin/i86/ksh93
depending on whether the matching ISA is supported or not (that way we
have a 64bit korn shell on 64bit platforms and in theory could allow
even accerlated versions like a normal 64bit AMD64 binary and a SSE5
binary))

BTW: far-fetched dreaming...
... do you accept ideas for more instruction extensions ? If "yes" - one
idea would be an "asyncronous block copy extension" which works like
this: 
You issue an instruction which works like |memmove()| (e.g. copy memory,
even overlapping) but continues execution until a 2nd
wait_for_block_copy instruction is reached. That way the block copy
could be fully asyncronous to the normal pipeline operation and would
not block until the "wait_for_block_copy"-instruction is reached. This
needs to work with multiple (where "multiple" means an "unlimited"
nesting depth) block copies issued, e.g. something like...
-- snip --
instr_mem_move_start a b 25
  some_other_instructions
  instr_mem_move_start c d 15
    instr_mem_move_start e f 15
    some_other_instructions
    some_other_instructions
    call_a_subroutine
    some_other_instructions
    some_other_instructions
    wait_for_block_copy # <--- waits for "e f 15"-copy
  wait_for_block_copy # <--- waits for "c d 15"-copy
wait_for_block_copy # <--- waits for "a b 25"-copy
-- snip --
(e.g. you end-up of having some kind of memory copy pipeline which works
in parallel to the normla integer/fp pipelines)
Additionally there should be a 2nd instruction which works like
|strncpy()| in the same async manner as described above, e.g. copies
until `\0` is reached - which may be either |char|, |int16_t|,
|int32_t|, |int64_t| (you need all the datatypes to handle copies for
|wchar_t| and other stuff).
Another flavor may be another set of the functions above which flushes
the matching source memories in L1/l2 and then runs the copy - in that
case you won't trash the L1/l2 caches with large block copy operations.

AFAIK the extension above may be usefull for applications which do lots
of string operations and copies...

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [EMAIL PROTECTED]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] Project proposal:"Solaris EnhancementsforAMD-based Platforms"

Reply via email to