That's good to know, thanks. I did log the current OS thread ID using `pthread_self()` as well as with `pthread_threadid_np()` and the thread ID is constant for all hid_write and hid_read calls. The issue appears anyway, unless the scheduler is invoked.
On Wed, 15 Feb 2023 at 23:52, Ian Lance Taylor <i...@golang.org> wrote: > > On Wed, Feb 15, 2023 at 2:36 PM Marko Bencun <be...@shiftcrypto.ch> wrote: > > > > That is a good hint, thanks. However, in the Go code above, there are > > no goroutines and everything runs in one thread as far as I can tell, > > but the issue appears anyway. > > There are always goroutines, even if your program doesn't start them. > For example, the garbage collector normally runs four goroutines in > the background. The main goroutine can and will run on different > threads even without any actual "go" statements. > > Ian > > > > On Wed, 15 Feb 2023 at 22:05, Ian Lance Taylor <i...@golang.org> wrote: > > > > > > On Wed, Feb 15, 2023 at 8:42 AM 'Marko Bencun' via golang-nuts > > > <golang-nuts@googlegroups.com> wrote: > > > > > > > > I am running into a a weird error in C code that only materializes > > > > when calling it from Go using cgo. I hope you could help me understand > > > > this behaviour and why calling runtime.Gosched() seems to resolve the > > > > issue. I worry that this may not be the right fix, so I want to > > > > understand what is going on. > > > > > > > > I communicate to a USB HID device and get an error every once in a > > > > while, only when using cgo, in around 5% of the messages sent to the > > > > device, non-deterministically. The messages sent and received are all > > > > the same and work most of the time. > > > > > > > > The C functions I am calling are hid_write and hid_read from here: > > > > https://github.com/libusb/hidapi/blob/4ebce6b5059b086d05ca7e091ce04a5fd08ac3ac/mac/hid.c > > > > > > > > If I call hid_write and hid_read in a loop in a C program, everything > > > > works as expected: > > > > > > > > ``` > > > > #include "hidapi.h" > > > > #include "hidapi_darwin.h" > > > > > > > > int run(hid_device *handle) { > > > > int res; > > > > > > > > const uint8_t write[64] = "<some 64 byte msg>"; > > > > const uint8_t expected[64] = "<expected 64 byte response>"; > > > > uint8_t read[64] = {0}; > > > > > > > > while(1) { > > > > res = hid_write(handle, write, sizeof(write)); > > > > if(res < 0) return -1; > > > > res = hid_read(handle, read, sizeof(read)); > > > > if(res < 0) return -1; > > > > if (memcmp(read, expected, 64)) return -1; > > > > } > > > > > > > > return 0; > > > > } > > > > ``` > > > > > > > > I ported the above to Go using cgo like below. I link `hid.o`, the > > > > same object file used in the C program, to make sure the same code is > > > > running to rule out differences in compilation: > > > > > > > > ``` > > > > package main > > > > > > > > import ( > > > > "bytes" > > > > "encoding/hex" > > > > ) > > > > > > > > /* > > > > #cgo darwin LDFLAGS: -framework IOKit -framework CoreFoundation > > > > -framework AppKit hid.o > > > > #include "hidapi.h" > > > > #include "hidapi_darwin.h" > > > > */ > > > > import "C" > > > > > > > > func main() { > > > > dev := C.hid_open(...) > > > > if dev == nil { > > > > panic("no device") > > > > } > > > > write := []byte("<some 64 byte msg>") > > > > expected := []byte("<expected 64 bytes response>") > > > > b := make([]byte, 64) > > > > for { > > > > written := C.hid_write(dev, (*C.uchar)(&write[0]), > > > > C.size_t(len(write))) > > > > if written < 0 { > > > > panic("error write") > > > > } > > > > read := C.hid_read(dev, (*C.uchar)(&b[0]), C.size_t(len(b))) > > > > if read < 0 { > > > > panic("error read") > > > > } > > > > if !bytes.Equal(b, expected) { > > > > panic("not equal") > > > > } > > > > } > > > > } > > > > ``` > > > > > > > > The Go program errors on hid_write with a "generic error" error code > > > > non-deterministically in about 5% of the messages sent. The USB device > > > > receives the message and responds normally however. > > > > > > > > I randomly tried adding `runtime.Gosched()` at the bottom of the > > > > Go-loop, and the issue disappeared completely. I am very confused why > > > > that would help, as the Go program has, as far as I can tell, no > > > > threads and no goroutines (except main). > > > > > > > > Other things I have checked: > > > > > > > > - If I move the loop from Go to C by calling `C.run(dev)` (the looping > > > > function defined above) from Go, there is no issue. > > > > - LockOSThread: if the loop runs in a goroutine and that goroutine > > > > switches OS threads, the issue reappears after some time (not > > > > immediately after a thread switch - the error happens > > > > non-deterministically) even if `runtime.Gosched()` is called. > > > > `runtime.LockOSThread()` is needed to fix it in that case. Since the > > > > goroutine is locked to an OS thread during the execution of a C > > > > function call anyway, this indicates that either hidapi or the macOS > > > > HID functions rely on thread-local vars across multiple C calls in > > > > some way, which seems a bit crazy. > > > > - In the above code (no goroutines), I checked that the OS thread ID > > > > (pthread_self()) is constant for all calls, and yet the issue appears > > > > unless runtime.Gosched() is called, which seems to contradict the > > > > previous point > > > > - I tried multiple Go versions between 1.16 and 1.20 and multiple > > > > macOS SDK versions between 10.13 and 13.1, all with the same problem > > > > (and same working fix). > > > > - only macOS is affected - on linux and Windows, there is no similar > > > > issue (these platforms use different C code to interact with the > > > > device). > > > > > > > > Does anyone have any insight into why invoking the scheduler could be > > > > necessary here or what could be going on in general? My worry is that > > > > using `runtime.LockOSThread()` and `runtime.Gosched()` are not proper > > > > fixes. > > > > > > > > > I didn't try to look at this in detail, but I did glance at the C code > > > you are calling, and it uses pthread_mutex_lock and friends. In > > > general pthread mutexes must be unlocked by the thread that locked > > > them, so it is quite possible that LockOSThread is required for > > > correct operation. I don't have an explanation for why Gosched would > > > help, though. > > > > > > Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAGgfmPPxFoY0aHxLAfZOogDipx%3DYExrz1J_41Hghdw-3529WCg%40mail.gmail.com.