That is a good hint, thanks. However, in the Go code above, there are
no goroutines and everything runs in one thread as far as I can tell,
but the issue appears anyway.

On Wed, 15 Feb 2023 at 22:05, Ian Lance Taylor <i...@golang.org> wrote:
>
> On Wed, Feb 15, 2023 at 8:42 AM 'Marko Bencun' via golang-nuts
> <golang-nuts@googlegroups.com> wrote:
> >
> > I am running into a a weird error in C code that only materializes
> > when calling it from Go using cgo. I hope you could help me understand
> > this behaviour and why calling runtime.Gosched() seems to resolve the
> > issue. I worry that this may not be the right fix, so I want to
> > understand what is going on.
> >
> > I communicate to a USB HID device and get an error every once in a
> > while, only when using cgo, in around 5% of the messages sent to the
> > device, non-deterministically. The messages sent and received are all
> > the same and work most of the time.
> >
> > The C functions I am calling are hid_write and hid_read from here:
> > https://github.com/libusb/hidapi/blob/4ebce6b5059b086d05ca7e091ce04a5fd08ac3ac/mac/hid.c
> >
> > If I call hid_write and hid_read in a loop in a C program, everything
> > works as expected:
> >
> > ```
> > #include "hidapi.h"
> > #include "hidapi_darwin.h"
> >
> > int run(hid_device *handle) {
> >     int res;
> >
> >     const uint8_t write[64] = "<some 64 byte msg>";
> >     const uint8_t expected[64] = "<expected 64 byte response>";
> >     uint8_t read[64] = {0};
> >
> > while(1) {
> >         res = hid_write(handle, write, sizeof(write));
> >         if(res < 0) return -1;
> >         res = hid_read(handle, read, sizeof(read));
> >         if(res < 0) return -1;
> >         if (memcmp(read, expected, 64)) return -1;
> > }
> >
> > return 0;
> > }
> > ```
> >
> > I ported the above to Go using cgo like below. I link `hid.o`, the
> > same object file used in the C program, to make sure the same code is
> > running to rule out differences in compilation:
> >
> > ```
> > package main
> >
> > import (
> >     "bytes"
> >     "encoding/hex"
> > )
> >
> > /*
> > #cgo darwin LDFLAGS: -framework IOKit -framework CoreFoundation
> > -framework AppKit hid.o
> > #include "hidapi.h"
> > #include "hidapi_darwin.h"
> > */
> > import "C"
> >
> > func main() {
> >     dev := C.hid_open(...)
> >     if dev == nil {
> >         panic("no device")
> >     }
> >     write := []byte("<some 64 byte msg>")
> >     expected := []byte("<expected 64 bytes response>")
> >     b := make([]byte, 64)
> >     for {
> >         written := C.hid_write(dev, (*C.uchar)(&write[0]), 
> > C.size_t(len(write)))
> >         if written < 0 {
> >             panic("error write")
> >         }
> >         read := C.hid_read(dev, (*C.uchar)(&b[0]), C.size_t(len(b)))
> >         if read < 0 {
> >             panic("error read")
> >         }
> >         if !bytes.Equal(b, expected) {
> >             panic("not equal")
> >         }
> >     }
> > }
> > ```
> >
> > The Go program errors on hid_write with a "generic error" error code
> > non-deterministically in about 5% of the messages sent. The USB device
> > receives the message and responds normally however.
> >
> > I randomly tried adding `runtime.Gosched()` at the bottom of the
> > Go-loop, and the issue disappeared completely. I am very confused why
> > that would help, as the Go program has, as far as I can tell, no
> > threads and no goroutines (except main).
> >
> > Other things I have checked:
> >
> > - If I move the loop from Go to C by calling `C.run(dev)` (the looping
> > function defined above) from Go, there is no issue.
> > - LockOSThread: if the loop runs in a goroutine and that goroutine
> > switches OS threads, the issue reappears after some time (not
> > immediately after a thread switch - the error happens
> > non-deterministically) even if `runtime.Gosched()` is called.
> > `runtime.LockOSThread()` is needed to fix it in that case. Since the
> > goroutine is locked to an OS thread during the execution of a C
> > function call anyway, this indicates that either hidapi or the macOS
> > HID functions rely on thread-local vars across multiple C calls in
> > some way, which seems a bit crazy.
> > - In the above code (no goroutines), I checked that the OS thread ID
> > (pthread_self()) is constant for all calls, and yet the issue appears
> > unless runtime.Gosched() is called, which seems to contradict the
> > previous point
> > - I tried multiple Go versions between 1.16 and 1.20 and multiple
> > macOS SDK versions between 10.13 and 13.1, all with the same problem
> > (and same working fix).
> > - only macOS is affected - on linux and Windows, there is no similar
> > issue (these platforms use different C code to interact with the
> > device).
> >
> > Does anyone have any insight into why invoking the scheduler could be
> > necessary here or what could be going on in general? My worry is that
> > using `runtime.LockOSThread()` and `runtime.Gosched()` are not proper
> > fixes.
>
>
> I didn't try to look at this in detail, but I did glance at the C code
> you are calling, and it uses pthread_mutex_lock and friends.  In
> general pthread mutexes must be unlocked by the thread that locked
> them, so it is quite possible that LockOSThread is required for
> correct operation.  I don't have an explanation for why Gosched would
> help, though.
>
> Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAGgfmPM4-e4EqrpXdf-CDgAhODpj7K1KEY%2BhuCL1KHzD7vbtnw%40mail.gmail.com.

Reply via email to