> Something like that?
short answer, Yes. however, from looking at it, couldn’t find documentation, that code is specific to speeding up graphics overlays? maybe? (accumulate) but it’s confusing me that its using templates, when there seems to only be one template. i was thinking of one, very simple, template per function, per h/w feature; so one each for; Sqrt(X+k) [4]float32 on SSE4, Sqrt(X+k) [4]float32 on NEON, Sqrt(X+k) [4]float64 on AVX2 Sqrt(X+k) [8]float32 on AVX2 k1/Sqrt(X+k2) on SSE4 ... which leads to a big, but i think a maintainable, collection. maintainable because, used in linear combinations, without adding that much overhead, stops the number rising at a high ordered rate, only functions with an element that has parallel support in the CPU have any point in being added, and since this would be open source, contributions of functions someone's added themselves could be contributed back. which is why i was wanting NEON support to begin with, so there could be a general outline onto which contributions could be made, most of the time it would just be a simple modification/extension of a basic pool. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.