That is a good hint, thanks. However, in the Go code above, there are no goroutines and everything runs in one thread as far as I can tell, but the issue appears anyway.
On Wed, 15 Feb 2023 at 22:05, Ian Lance Taylor <i...@golang.org> wrote: > > On Wed, Feb 15, 2023 at 8:42 AM 'Marko Bencun' via golang-nuts > <golang-nuts@googlegroups.com> wrote: > > > > I am running into a a weird error in C code that only materializes > > when calling it from Go using cgo. I hope you could help me understand > > this behaviour and why calling runtime.Gosched() seems to resolve the > > issue. I worry that this may not be the right fix, so I want to > > understand what is going on. > > > > I communicate to a USB HID device and get an error every once in a > > while, only when using cgo, in around 5% of the messages sent to the > > device, non-deterministically. The messages sent and received are all > > the same and work most of the time. > > > > The C functions I am calling are hid_write and hid_read from here: > > https://github.com/libusb/hidapi/blob/4ebce6b5059b086d05ca7e091ce04a5fd08ac3ac/mac/hid.c > > > > If I call hid_write and hid_read in a loop in a C program, everything > > works as expected: > > > > ``` > > #include "hidapi.h" > > #include "hidapi_darwin.h" > > > > int run(hid_device *handle) { > > int res; > > > > const uint8_t write[64] = "<some 64 byte msg>"; > > const uint8_t expected[64] = "<expected 64 byte response>"; > > uint8_t read[64] = {0}; > > > > while(1) { > > res = hid_write(handle, write, sizeof(write)); > > if(res < 0) return -1; > > res = hid_read(handle, read, sizeof(read)); > > if(res < 0) return -1; > > if (memcmp(read, expected, 64)) return -1; > > } > > > > return 0; > > } > > ``` > > > > I ported the above to Go using cgo like below. I link `hid.o`, the > > same object file used in the C program, to make sure the same code is > > running to rule out differences in compilation: > > > > ``` > > package main > > > > import ( > > "bytes" > > "encoding/hex" > > ) > > > > /* > > #cgo darwin LDFLAGS: -framework IOKit -framework CoreFoundation > > -framework AppKit hid.o > > #include "hidapi.h" > > #include "hidapi_darwin.h" > > */ > > import "C" > > > > func main() { > > dev := C.hid_open(...) > > if dev == nil { > > panic("no device") > > } > > write := []byte("<some 64 byte msg>") > > expected := []byte("<expected 64 bytes response>") > > b := make([]byte, 64) > > for { > > written := C.hid_write(dev, (*C.uchar)(&write[0]), > > C.size_t(len(write))) > > if written < 0 { > > panic("error write") > > } > > read := C.hid_read(dev, (*C.uchar)(&b[0]), C.size_t(len(b))) > > if read < 0 { > > panic("error read") > > } > > if !bytes.Equal(b, expected) { > > panic("not equal") > > } > > } > > } > > ``` > > > > The Go program errors on hid_write with a "generic error" error code > > non-deterministically in about 5% of the messages sent. The USB device > > receives the message and responds normally however. > > > > I randomly tried adding `runtime.Gosched()` at the bottom of the > > Go-loop, and the issue disappeared completely. I am very confused why > > that would help, as the Go program has, as far as I can tell, no > > threads and no goroutines (except main). > > > > Other things I have checked: > > > > - If I move the loop from Go to C by calling `C.run(dev)` (the looping > > function defined above) from Go, there is no issue. > > - LockOSThread: if the loop runs in a goroutine and that goroutine > > switches OS threads, the issue reappears after some time (not > > immediately after a thread switch - the error happens > > non-deterministically) even if `runtime.Gosched()` is called. > > `runtime.LockOSThread()` is needed to fix it in that case. Since the > > goroutine is locked to an OS thread during the execution of a C > > function call anyway, this indicates that either hidapi or the macOS > > HID functions rely on thread-local vars across multiple C calls in > > some way, which seems a bit crazy. > > - In the above code (no goroutines), I checked that the OS thread ID > > (pthread_self()) is constant for all calls, and yet the issue appears > > unless runtime.Gosched() is called, which seems to contradict the > > previous point > > - I tried multiple Go versions between 1.16 and 1.20 and multiple > > macOS SDK versions between 10.13 and 13.1, all with the same problem > > (and same working fix). > > - only macOS is affected - on linux and Windows, there is no similar > > issue (these platforms use different C code to interact with the > > device). > > > > Does anyone have any insight into why invoking the scheduler could be > > necessary here or what could be going on in general? My worry is that > > using `runtime.LockOSThread()` and `runtime.Gosched()` are not proper > > fixes. > > > I didn't try to look at this in detail, but I did glance at the C code > you are calling, and it uses pthread_mutex_lock and friends. In > general pthread mutexes must be unlocked by the thread that locked > them, so it is quite possible that LockOSThread is required for > correct operation. I don't have an explanation for why Gosched would > help, though. > > Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAGgfmPM4-e4EqrpXdf-CDgAhODpj7K1KEY%2BhuCL1KHzD7vbtnw%40mail.gmail.com.