"Ostrovsky, Boris" wrote: > > On Nov 1, 2007 5:33 PM, Roland Mainz <[EMAIL PROTECTED]> wrote: > > > I was referring to something I read recently in a german computer > > > magazine tha the upcoming AMD CPUs have some kind of special 128 FP > > > instruction stuff (disclimer: I have no clue what exactly te stuff > was) > > > > It may be referring to the SSE5 instruction set... > > This is a brief description of SSE5 instruction set. > > http://developer.amd.com/sse5.jsp
Thanks! :-) > > BTW, Sun engineers continue to add new features in the compiler for > AMD: > > http://blogs.sun.com/tatkar/entry/sun_studio_patch_supports_barcelona > > > > > ... > > > ... if this stuff includes some kind of special instructions it may > be > > > nice to reflect this via matching flags in the $ /usr/bin/isalist # > > > output. > > That's a good idea. isalist doesn't report any of SSE* now. getisax() > does. Umpf... the problem is that something like |getisax()| isn't used by "isaexec" (/usr/lib/isaexec is a tool which acts as a "switch" which "redirects" requests like /usr/bin/ksh93 (which is a hardlink to /usr/lib/isaexec) to either /usr/bin/amd64/ksh93 or /usr/bin/i86/ksh93 depending on whether the matching ISA is supported or not (that way we have a 64bit korn shell on 64bit platforms and in theory could allow even accerlated versions like a normal 64bit AMD64 binary and a SSE5 binary)) BTW: far-fetched dreaming... ... do you accept ideas for more instruction extensions ? If "yes" - one idea would be an "asyncronous block copy extension" which works like this: You issue an instruction which works like |memmove()| (e.g. copy memory, even overlapping) but continues execution until a 2nd wait_for_block_copy instruction is reached. That way the block copy could be fully asyncronous to the normal pipeline operation and would not block until the "wait_for_block_copy"-instruction is reached. This needs to work with multiple (where "multiple" means an "unlimited" nesting depth) block copies issued, e.g. something like... -- snip -- instr_mem_move_start a b 25 some_other_instructions instr_mem_move_start c d 15 instr_mem_move_start e f 15 some_other_instructions some_other_instructions call_a_subroutine some_other_instructions some_other_instructions wait_for_block_copy # <--- waits for "e f 15"-copy wait_for_block_copy # <--- waits for "c d 15"-copy wait_for_block_copy # <--- waits for "a b 25"-copy -- snip -- (e.g. you end-up of having some kind of memory copy pipeline which works in parallel to the normla integer/fp pipelines) Additionally there should be a 2nd instruction which works like |strncpy()| in the same async manner as described above, e.g. copies until `\0` is reached - which may be either |char|, |int16_t|, |int32_t|, |int64_t| (you need all the datatypes to handle copies for |wchar_t| and other stuff). Another flavor may be another set of the functions above which flushes the matching source memories in L1/l2 and then runs the copy - in that case you won't trash the L1/l2 caches with large block copy operations. AFAIK the extension above may be usefull for applications which do lots of string operations and copies... ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) [EMAIL PROTECTED] \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;) _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org