> Something like that?

short answer, Yes.

however, from looking at it, couldn’t find documentation, that code is 
specific to speeding up graphics overlays? maybe? (accumulate)

but it’s confusing me that its using templates, when there seems to only be 
one template.

i was thinking of one, very simple, template per function, per h/w 
feature;  

so one each for;
Sqrt(X+k) [4]float32 on SSE4,
Sqrt(X+k) [4]float32 on NEON,
Sqrt(X+k) [4]float64 on AVX2
Sqrt(X+k) [8]float32 on AVX2
k1/Sqrt(X+k2) on SSE4
...

which leads to a big, but i think a maintainable, collection. 

maintainable because, used in linear combinations, without adding that much 
overhead, stops the number rising at a high ordered rate, only functions 
with an element that has parallel support in the CPU have any point in 
being added, and since this would be open source, contributions of 
functions someone's added themselves could be contributed back.

which is why i was wanting NEON support to begin with, so there could be a 
general outline onto which contributions could be made, most of the time it 
would just be a simple modification/extension of a basic pool.



 


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to