On Wed, Feb 15, 2023 at 8:42 AM 'Marko Bencun' via golang-nuts <golang-nuts@googlegroups.com> wrote: > > I am running into a a weird error in C code that only materializes > when calling it from Go using cgo. I hope you could help me understand > this behaviour and why calling runtime.Gosched() seems to resolve the > issue. I worry that this may not be the right fix, so I want to > understand what is going on. > > I communicate to a USB HID device and get an error every once in a > while, only when using cgo, in around 5% of the messages sent to the > device, non-deterministically. The messages sent and received are all > the same and work most of the time. > > The C functions I am calling are hid_write and hid_read from here: > https://github.com/libusb/hidapi/blob/4ebce6b5059b086d05ca7e091ce04a5fd08ac3ac/mac/hid.c > > If I call hid_write and hid_read in a loop in a C program, everything > works as expected: > > ``` > #include "hidapi.h" > #include "hidapi_darwin.h" > > int run(hid_device *handle) { > int res; > > const uint8_t write[64] = "<some 64 byte msg>"; > const uint8_t expected[64] = "<expected 64 byte response>"; > uint8_t read[64] = {0}; > > while(1) { > res = hid_write(handle, write, sizeof(write)); > if(res < 0) return -1; > res = hid_read(handle, read, sizeof(read)); > if(res < 0) return -1; > if (memcmp(read, expected, 64)) return -1; > } > > return 0; > } > ``` > > I ported the above to Go using cgo like below. I link `hid.o`, the > same object file used in the C program, to make sure the same code is > running to rule out differences in compilation: > > ``` > package main > > import ( > "bytes" > "encoding/hex" > ) > > /* > #cgo darwin LDFLAGS: -framework IOKit -framework CoreFoundation > -framework AppKit hid.o > #include "hidapi.h" > #include "hidapi_darwin.h" > */ > import "C" > > func main() { > dev := C.hid_open(...) > if dev == nil { > panic("no device") > } > write := []byte("<some 64 byte msg>") > expected := []byte("<expected 64 bytes response>") > b := make([]byte, 64) > for { > written := C.hid_write(dev, (*C.uchar)(&write[0]), > C.size_t(len(write))) > if written < 0 { > panic("error write") > } > read := C.hid_read(dev, (*C.uchar)(&b[0]), C.size_t(len(b))) > if read < 0 { > panic("error read") > } > if !bytes.Equal(b, expected) { > panic("not equal") > } > } > } > ``` > > The Go program errors on hid_write with a "generic error" error code > non-deterministically in about 5% of the messages sent. The USB device > receives the message and responds normally however. > > I randomly tried adding `runtime.Gosched()` at the bottom of the > Go-loop, and the issue disappeared completely. I am very confused why > that would help, as the Go program has, as far as I can tell, no > threads and no goroutines (except main). > > Other things I have checked: > > - If I move the loop from Go to C by calling `C.run(dev)` (the looping > function defined above) from Go, there is no issue. > - LockOSThread: if the loop runs in a goroutine and that goroutine > switches OS threads, the issue reappears after some time (not > immediately after a thread switch - the error happens > non-deterministically) even if `runtime.Gosched()` is called. > `runtime.LockOSThread()` is needed to fix it in that case. Since the > goroutine is locked to an OS thread during the execution of a C > function call anyway, this indicates that either hidapi or the macOS > HID functions rely on thread-local vars across multiple C calls in > some way, which seems a bit crazy. > - In the above code (no goroutines), I checked that the OS thread ID > (pthread_self()) is constant for all calls, and yet the issue appears > unless runtime.Gosched() is called, which seems to contradict the > previous point > - I tried multiple Go versions between 1.16 and 1.20 and multiple > macOS SDK versions between 10.13 and 13.1, all with the same problem > (and same working fix). > - only macOS is affected - on linux and Windows, there is no similar > issue (these platforms use different C code to interact with the > device). > > Does anyone have any insight into why invoking the scheduler could be > necessary here or what could be going on in general? My worry is that > using `runtime.LockOSThread()` and `runtime.Gosched()` are not proper > fixes.
I didn't try to look at this in detail, but I did glance at the C code you are calling, and it uses pthread_mutex_lock and friends. In general pthread mutexes must be unlocked by the thread that locked them, so it is quite possible that LockOSThread is required for correct operation. I don't have an explanation for why Gosched would help, though. Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcXYYvAYzDzhduz-dtWP0Xqgo24DQMzp4zMhDRYp6YvH8Q%40mail.gmail.com.