On Fri Jan 21, 2022 at 02:28 CET, Greg Lowe wrote: > Hi team, > > I'm a recent Gopher, and have had great success over the past year > developing an insurance modelling application. The tooling is great, > thanks > to the team for creating it. > > 1) SIMD Workflow > > I've got hot functions in my application which are doing element wise > operations on float slices. Some are just element-wise addition, and > multiplication, some are slightly more complicated. > > I'm currently deploying on AWS Lambda X86 which as AX2 support (Xeon > Haswell+), but I'm also experimenting with Arm64 (Graviton 2), and would > also like to do some benchmarking on Graviton 3 (only available on EC2 > ATM). > > I've been experimenting with implementing the hot functions in Go's ASM > dialect, and using some simple code generation, to handle all the > repetition. Nothing fancy, not much more than string templating. The > results have been pretty good, but the workflow is pretty slow. > > As a side project I've been toying with the idea of writing a slightly > more > advanced tool, that could read a "SIMD kernel" written as a simple Go > function with a specific form, and generate ASM implementations for it. > No > fancy optimisations, just loop unrolling and vector instructions. > > For example: > > import . asmgen > > // Implementation in a generated .s file > func Foo(dst []float32, a float32, x, y []float32) > > // AST used as input to ASM codegen > func kernelFoo(i int, dst []float32, a float32, x, y []float32) { > dst[i] = min(a * x[i], y[i]) > } > > In reality, I probably don't have the time to do that, but it does feel > like something minimal that would actually cover most of my immediate > use > cases is not a huge amount of work. > > I guess this is basically just a limited form of c2goasm . See: > https://github.com/minio/c2goasm > > So maybe I should just use that, however including big blobs of hex > encoded > ASM doesn't seem great either. See: > https://github.com/apache/arrow/blob/master/go/parquet/internal/utils/min_max_neon_arm64.s > > So apologies that this question is a bit vague and rambly. But the > workflow > for SIMD here is pretty slow, and it feels like there could be a better > way > to solve this. So I'm basically just reaching out to see if anyone else > has > been working on this, or thinking about it, or has ideas about better > solutions.
FYI, for X86, there's also: - https://github.com/mmcloughlin/avo which takes a slightly lower level approach (by requiring people to use a closer-to-x86-asm vocabulary). avo could become the target of your SIMD tool once the kernel has been parsed into an AST. (and once avo has gained an ARM backend.) -s -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CHB6A5HAG8M1.1UUD0O1YN9690%40zoidberg.