Hi team, I'm a recent Gopher, and have had great success over the past year developing an insurance modelling application. The tooling is great, thanks to the team for creating it.
1) SIMD Workflow I've got hot functions in my application which are doing element wise operations on float slices. Some are just element-wise addition, and multiplication, some are slightly more complicated. I'm currently deploying on AWS Lambda X86 which as AX2 support (Xeon Haswell+), but I'm also experimenting with Arm64 (Graviton 2), and would also like to do some benchmarking on Graviton 3 (only available on EC2 ATM). I've been experimenting with implementing the hot functions in Go's ASM dialect, and using some simple code generation, to handle all the repetition. Nothing fancy, not much more than string templating. The results have been pretty good, but the workflow is pretty slow. As a side project I've been toying with the idea of writing a slightly more advanced tool, that could read a "SIMD kernel" written as a simple Go function with a specific form, and generate ASM implementations for it. No fancy optimisations, just loop unrolling and vector instructions. For example: import . asmgen // Implementation in a generated .s file func Foo(dst []float32, a float32, x, y []float32) // AST used as input to ASM codegen func kernelFoo(i int, dst []float32, a float32, x, y []float32) { dst[i] = min(a * x[i], y[i]) } In reality, I probably don't have the time to do that, but it does feel like something minimal that would actually cover most of my immediate use cases is not a huge amount of work. I guess this is basically just a limited form of c2goasm . See: https://github.com/minio/c2goasm So maybe I should just use that, however including big blobs of hex encoded ASM doesn't seem great either. See: https://github.com/apache/arrow/blob/master/go/parquet/internal/utils/min_max_neon_arm64.s So apologies that this question is a bit vague and rambly. But the workflow for SIMD here is pretty slow, and it feels like there could be a better way to solve this. So I'm basically just reaching out to see if anyone else has been working on this, or thinking about it, or has ideas about better solutions. 2) Arm64 ASM Neon Instructions: One problem that's come up, is there's a bunch of ARM instructions which aren't defined in Go's assembler. So it looks like I'm going to have to write some code to generate the hex for these. I can probably copy the approach used here: https://github.com/minio/asm2plan9s/blob/master/asm2plan9s_arm64.go For example - I'm currently writing: WORD $0x4E24D400 // fadd v0.4s, v0.4s, v4.4s But would like write: VFADD V0.S4, V0.S4, V4.S4 I see there's an existing issue to add a bunch of Neon floating point instructions: https://github.com/golang/go/issues/41092 I actually spent a while having a go at adding the instructions myself, but couldn't figure it out. I also see that there is also a proposal and a MR to refactor the Arm64 assembler. https://github.com/golang/go/issues/44734 Is there any ongoing work there, or has that effort stalled? Anyways, thanks for reading my big wall of text. Cheers, Greg. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/7d4f372f-c014-48c1-9894-e76caf86f5aen%40googlegroups.com.