Thanks Ian, no need to apologize, I know this stuff is hard. I'll adjust my approach and see if I can conjure up enough working C code to try buffering the incoming data, and then call into Go in batches. Even buffering as few as 10 records at a time should significantly speed up the execution if my issue truly is cgo overhead.
Tom On Wednesday, May 27, 2020 at 6:44:18 PM UTC-4, Ian Lance Taylor wrote: > > On Wed, May 27, 2020 at 10:08 AM Tom Larsen <larsen...@gmail.com > <javascript:>> wrote: > > > > I am attempting to build a Golang SDK for the Alteryx analytic > application. Alteryx provides a C API for interacting with the engine, so > I thought I would use cgo to build a bridge between Alteryx and Go. > > > > The basic flow-of-control looks something like this: > > > > The engine pushes a record of data (a C pointer to a blob of bytes) to > my SDK by calling a cgo function (iiPushRecord). So, C is calling Go here. > My cgo function looks like this: > > > > //export iiPushRecord > > func iiPushRecord(handle unsafe.Pointer, record unsafe.Pointer) C.long { > > incomingInterface := pointer.Restore(handle).(IncomingInterface) > > if incomingInterface.PushRecord(record) { > > return C.long(1) > > } > > return C.long(0) > > } > > > > My SDK calls a method on an interface that does something with the data. > For my basic example, I'm just copying the data to some outgoing buffers > (theoretically, a best case scenario). > > The interface object pushes the data back to the engine by calling my > SDK's PushRecord function, which in turn calls a similar C function on the > engine. The PushRecord function in my SDK looks like this: > > > > func PushRecord(connection *ConnectionInterfaceStruct, record > unsafe.Pointer) error { > > result := C.callPushRecord(connection.connection, record) > > if result == C.long(0) { > > return fmt.Errorf(`error calling pII_PushRecord`) > > } > > return nil > > } > > > > > > and the callPushRecord function in C looks like this: > > > > long callPushRecord(struct IncomingConnectionInterface * connection, > void * record) { > > return connection->pII_PushRecord(connection->handle, record); > > } > > > > When I execute my base code 10 million times (simulating 10 million > records) in a unit test, it will execute in 20-30 seconds. This test does > not include the cgo calls. However, when I package the tool and execute it > in Alteryx with 10 million records, it takes about 1 minute 20 seconds to > execute. I benchmarked against an equivalent tool I built using Alteryx's > own Python SDK, which takes 1 minute. My goal is to be faster than Python. > > > > I ran a CPU profile while Alteryx was running. Of the 1.38 minute > runtime, the profile samples covered 42.95 seconds. The profile starts out > like this: > > > > crosscall2 (0%) -> _cgoexp_89e40a732b6d_iiPushRecord (0%) -> runtime > cgoballback (0%) -> runtime cgocallback_gofunc (0.14%) > > > > At this point, the profile branches into 3: > > > > runtime cgocallback, which eventually calls all of my SDK code. This > branch accounts for 17.06 seconds in total > > runtime needm, which accounts for 8.21 seconds in total > > runtime dropm, which accounts for 17.43 seconds in total > > > > If you want a graphical display of the profile, it's here: > https://i.stack.imgur.com/CphbG.png > > > > It looks like the C to Go overhead is responsible for ~60% of the total > execution time? Is this the correct way to interpret the profile? If so, > is it because of something I did wrong, or is this overhead inherent to the > runtime? There isn't noticeable overhead when my Go code calls C, so the > upfront overhead from C to Go really surprised me. Is there anything I can > do here? > > > > I am running Go 1.14.3 on windows/amd64. It's actually a Windows 10 VM > on my Macbook, if that makes any difference. > > > > All of the code is on GitHub: https://github.com/tlarsen7572/goalteryx > > > > Note: I asked this on SO a few days ago, but got no answers, so I > thought I would try here. I hope that's ok. > > > I haven't looked at your code in detail. But a plausible rule of > thumb is that a call from Go to C takes as long as ten function calls, > and calling from C to Go is worse. There are several reasons for > this, and there is certainly interest in making it faster, but it's a > hard problem. > > This unfortunately means that you should not design your program to > casually call between Go and C. Where possible you should batch calls > and you should try to build data structures entirely in one language > before passing them to the other language. > > Sorry for the difficulties. > > Ian > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/945397fb-19c7-47d9-b13a-f7d1b1238093%40googlegroups.com.