On Sunday, May 15, 2005, at 04:11 PM, Luke Kenneth Casson Leighton wrote:
 *click* - so you .... you... ooooooo :)

 holy cow.

you looked at valarray,

No, not really, I'm not a library guy. I know of almost nothing of the space, the applications or the tricks people play, but...


and went "how could this be automatically speeded up by gcc, if gcc had access to a hardware vector processing unit"?

i'm... genuinely impressed.

I'm sorry, wasn't meant to be impressive. What would have been impressive, is if I read up on ASP and coded up some complex algorithm using all the latest tips and tricks of templates, and had you try it and and you discovered that indeed it was trivial enough to write, exactly matched what, as an author, you would have expected, best case... I think that is possible, but alas, I'd just leave it as an exercise for the reader.


can you _imagine_ the number of different tags you'd need to say
"i want this register to be 1-bit wide, spread across 16 processors each,
i want _this_ register array to be 4-bits wide, spread across 32 processors.."

bitregister<1,16> i;

bitregister<4, 32> j;

I can imagine...  Seems trivial to me...

... it just goes _nuts_.

I don't see the use of the above nuts. Coding up the library to support it, would be, well, fun.... but for you (someone that knows ASP) and someone that knows how to make C++ do tricks (expression templates and template metaprograms at least) for them, it should be trivial enough.


well, the approach taken by aspex _makes_ it portable, already
[because it's a macro pre-processing step, turning inline-asp
instructions into c-code].

Vendor lock in by a vendor that can go out of business isn't what we call portable. Portable means that someone versed in it, can use it, and that code can run on sse3, mmx, altivec, ASP, normal hardware or a Cell processor, BlueGene, virtually unmodified. For example, OpenMP would seem to be portable (not being an expert in that field, I'd let people correct me). BLAS, boost and Blitz++ are yet other ways... http://ggt.sourceforge.net/html/main.html is a new one I've not heard of... but google has.


Do you know what Blitz++ is and does?  And how?

valarray STL is an ISO/ANSI _standard_.

Yes.

 you declare a valarray<int> x(20) or something.

 you then do x += 5 and all 20 integers in the array x get 5 added to
 them.

Yes.

 i believe it to be quite straightforward to modify valarray on a
 per-vector-based-architecture basis to provide support for whatever
 accelerated instructions are available.

Yes.

 in fact - right now - you could probably do it _now_ for MMX, altivec,
 Sony Playstation and MasPar hardware: all of these have hardware-based
 assembly instruction opcodes, yes?

Just a matter of code, yes.

[just not the ASP, because of their proprietary assembler-based toolchain]

No, even ASP, one just needs to understand the output of their compiler, and then code it up, though, admittedly, one might not get the speed, if the interface (valarray) is wrong. The deferred evaluation math libraries would be closer to what might be required, don't know if it is enough, but it might be; even if it weren't, a few more concepts and certainly it would be.


 instead of doing
 for (i = 0; i < this->get_size(); i++)
        this->data[i] += op1->data[i]

 you'd do

 for (i = 0; i < this->get_size(); i+= vector_unit->get_size())
 {
        asm { .... }
 }

No, expression templates don't require rewriting of code like this. That's the entire point.


 i imagine this to be a _whole_ lot less grief than putting support
 in gcc for vectors / autodetection / tagging.

I actually mean to include C++ library work, as a first solution, as doing up a library is usually preferable to compiler work.


 ... don't get me wrong - i'd be _delighted_ to see vector
 autodetection and tagging in gcc!

Presto, download a copy today. :-)



Reply via email to