On Fri Jan 21, 2022 at 02:28 CET, Greg Lowe wrote:
> Hi team,
>
> I'm a recent Gopher, and have had great success over the past year
> developing an insurance modelling application. The tooling is great,
> thanks
> to the team for creating it.
>
> 1) SIMD Workflow
>
> I've got hot functions in my application which are doing element wise
> operations on float slices. Some are just element-wise addition, and
> multiplication, some are slightly more complicated.
>
> I'm currently deploying on AWS Lambda X86 which as AX2 support (Xeon
> Haswell+), but I'm also experimenting with Arm64 (Graviton 2), and would
> also like to do some benchmarking on Graviton 3 (only available on EC2
> ATM).
>
> I've been experimenting with implementing the hot functions in Go's ASM
> dialect, and using some simple code generation, to handle all the
> repetition. Nothing fancy, not much more than string templating. The
> results have been pretty good, but the workflow is pretty slow.
>
> As a side project I've been toying with the idea of writing a slightly
> more
> advanced tool, that could read a "SIMD kernel" written as a simple Go
> function with a specific form, and generate ASM implementations for it.
> No
> fancy optimisations, just loop unrolling and vector instructions.
>
> For example:
>
> import . asmgen
>
> // Implementation in a generated .s file
> func Foo(dst []float32, a float32, x, y []float32)
>
> // AST used as input to ASM codegen
> func kernelFoo(i int, dst []float32, a float32, x, y []float32) {
> dst[i] = min(a * x[i], y[i])
> }
>
> In reality, I probably don't have the time to do that, but it does feel
> like something minimal that would actually cover most of my immediate
> use
> cases is not a huge amount of work.
>
> I guess this is basically just a limited form of c2goasm . See:
> https://github.com/minio/c2goasm
>
> So maybe I should just use that, however including big blobs of hex
> encoded
> ASM doesn't seem great either. See:
> https://github.com/apache/arrow/blob/master/go/parquet/internal/utils/min_max_neon_arm64.s
>
> So apologies that this question is a bit vague and rambly. But the
> workflow
> for SIMD here is pretty slow, and it feels like there could be a better
> way
> to solve this. So I'm basically just reaching out to see if anyone else
> has
> been working on this, or thinking about it, or has ideas about better
> solutions.

FYI, for X86, there's also:

- https://github.com/mmcloughlin/avo

which takes a slightly lower level approach (by requiring people to use
a closer-to-x86-asm vocabulary).

avo could become the target of your SIMD tool once the kernel has been
parsed into an AST.

(and once avo has gained an ARM backend.)

-s

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CHB6A5HAG8M1.1UUD0O1YN9690%40zoidberg.

Reply via email to