I've run into an annoying problem. I'm not sure if there's a solution I've overlooked and I'm hoping you folks have some ideas for me.
I have a library that provides a function Sum64(b []byte) uint64. I have both asm and pure-go implementations of this function, and I want both to be as fast as possible. It works like this today: x.go: func Sum64(b []byte) uint64 { return sum64(b) } func sum64Go(b []byte) uint64 { /* go implementation here */ } x_noasm.go: func sum64(b []byte) uint64 { return sum64Go(b) } x_amd64.go: func sum64(b []byte) uint64 // asm implementation in x_amd64.s This allows me to have both Go and x64 implementations of my function, and furthermore, in x_amd64_test.go, I can compare both implementations against each other by calling sum64 (asm) and sum64Go (Go). Problem 1 is that every call to Sum64 incurs double function-call overhead because of the indirection to sum64. This overhead is significant for smallish inputs. I can work around it by getting rid of sum64 and declaring Sum64 twice (once in x_noasm.go and once in x_amd64.go). This is annoying because I need to maintain duplicate documentation comments. Problem 2 is that the pure-Go version of Sum64 incurs triple function-call overhead: Sum64 -> sum64 -> sum64Go. The only workarounds I can come up with are to either forgo my tests which compare the Go+asm implementations, or else maintain two independent copies of sum64Go (one for noasm and one for amd64). Neither of these seem acceptable to me. Aram gave me the idea of using //go:linkname as a hacky workaround; this doesn't work within a single package but I suppose I could introduce an internal package for one of the implementations. That seems fairly awful. One tool change that would help with problem 1 but not problem 2 is if a Go stub could have its body implemented in Go also. That is, x.go could declare func Sum64(b []byte) uint64 with no body and then the bodies would be provided in x_noasm.go (in Go) and x_amd64.s (in asm). Another idea that would solve both problems is if the compiler could inline the "forwarding" functions to avoid the extra call. Isn't that much simpler than the general problem of inlining non-leaf functions? I suppose there's still the stack trace issue. Searching for that, I did find https://github.com/golang/go/issues/8421. Any chance this could happen in the near-ish term? Or any other ideas? -Caleb -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.