That's good to know, thanks. I did log the current OS thread ID using
`pthread_self()` as well as with `pthread_threadid_np()` and the
thread ID is constant for all hid_write and hid_read calls. The issue
appears anyway, unless the scheduler is invoked.

On Wed, 15 Feb 2023 at 23:52, Ian Lance Taylor <i...@golang.org> wrote:
>
> On Wed, Feb 15, 2023 at 2:36 PM Marko Bencun <be...@shiftcrypto.ch> wrote:
> >
> > That is a good hint, thanks. However, in the Go code above, there are
> > no goroutines and everything runs in one thread as far as I can tell,
> > but the issue appears anyway.
>
> There are always goroutines, even if your program doesn't start them.
> For example, the garbage collector normally runs four goroutines in
> the background.  The main goroutine can and will run on different
> threads even without any actual "go" statements.
>
> Ian
>
>
> > On Wed, 15 Feb 2023 at 22:05, Ian Lance Taylor <i...@golang.org> wrote:
> > >
> > > On Wed, Feb 15, 2023 at 8:42 AM 'Marko Bencun' via golang-nuts
> > > <golang-nuts@googlegroups.com> wrote:
> > > >
> > > > I am running into a a weird error in C code that only materializes
> > > > when calling it from Go using cgo. I hope you could help me understand
> > > > this behaviour and why calling runtime.Gosched() seems to resolve the
> > > > issue. I worry that this may not be the right fix, so I want to
> > > > understand what is going on.
> > > >
> > > > I communicate to a USB HID device and get an error every once in a
> > > > while, only when using cgo, in around 5% of the messages sent to the
> > > > device, non-deterministically. The messages sent and received are all
> > > > the same and work most of the time.
> > > >
> > > > The C functions I am calling are hid_write and hid_read from here:
> > > > https://github.com/libusb/hidapi/blob/4ebce6b5059b086d05ca7e091ce04a5fd08ac3ac/mac/hid.c
> > > >
> > > > If I call hid_write and hid_read in a loop in a C program, everything
> > > > works as expected:
> > > >
> > > > ```
> > > > #include "hidapi.h"
> > > > #include "hidapi_darwin.h"
> > > >
> > > > int run(hid_device *handle) {
> > > >     int res;
> > > >
> > > >     const uint8_t write[64] = "<some 64 byte msg>";
> > > >     const uint8_t expected[64] = "<expected 64 byte response>";
> > > >     uint8_t read[64] = {0};
> > > >
> > > > while(1) {
> > > >         res = hid_write(handle, write, sizeof(write));
> > > >         if(res < 0) return -1;
> > > >         res = hid_read(handle, read, sizeof(read));
> > > >         if(res < 0) return -1;
> > > >         if (memcmp(read, expected, 64)) return -1;
> > > > }
> > > >
> > > > return 0;
> > > > }
> > > > ```
> > > >
> > > > I ported the above to Go using cgo like below. I link `hid.o`, the
> > > > same object file used in the C program, to make sure the same code is
> > > > running to rule out differences in compilation:
> > > >
> > > > ```
> > > > package main
> > > >
> > > > import (
> > > >     "bytes"
> > > >     "encoding/hex"
> > > > )
> > > >
> > > > /*
> > > > #cgo darwin LDFLAGS: -framework IOKit -framework CoreFoundation
> > > > -framework AppKit hid.o
> > > > #include "hidapi.h"
> > > > #include "hidapi_darwin.h"
> > > > */
> > > > import "C"
> > > >
> > > > func main() {
> > > >     dev := C.hid_open(...)
> > > >     if dev == nil {
> > > >         panic("no device")
> > > >     }
> > > >     write := []byte("<some 64 byte msg>")
> > > >     expected := []byte("<expected 64 bytes response>")
> > > >     b := make([]byte, 64)
> > > >     for {
> > > >         written := C.hid_write(dev, (*C.uchar)(&write[0]), 
> > > > C.size_t(len(write)))
> > > >         if written < 0 {
> > > >             panic("error write")
> > > >         }
> > > >         read := C.hid_read(dev, (*C.uchar)(&b[0]), C.size_t(len(b)))
> > > >         if read < 0 {
> > > >             panic("error read")
> > > >         }
> > > >         if !bytes.Equal(b, expected) {
> > > >             panic("not equal")
> > > >         }
> > > >     }
> > > > }
> > > > ```
> > > >
> > > > The Go program errors on hid_write with a "generic error" error code
> > > > non-deterministically in about 5% of the messages sent. The USB device
> > > > receives the message and responds normally however.
> > > >
> > > > I randomly tried adding `runtime.Gosched()` at the bottom of the
> > > > Go-loop, and the issue disappeared completely. I am very confused why
> > > > that would help, as the Go program has, as far as I can tell, no
> > > > threads and no goroutines (except main).
> > > >
> > > > Other things I have checked:
> > > >
> > > > - If I move the loop from Go to C by calling `C.run(dev)` (the looping
> > > > function defined above) from Go, there is no issue.
> > > > - LockOSThread: if the loop runs in a goroutine and that goroutine
> > > > switches OS threads, the issue reappears after some time (not
> > > > immediately after a thread switch - the error happens
> > > > non-deterministically) even if `runtime.Gosched()` is called.
> > > > `runtime.LockOSThread()` is needed to fix it in that case. Since the
> > > > goroutine is locked to an OS thread during the execution of a C
> > > > function call anyway, this indicates that either hidapi or the macOS
> > > > HID functions rely on thread-local vars across multiple C calls in
> > > > some way, which seems a bit crazy.
> > > > - In the above code (no goroutines), I checked that the OS thread ID
> > > > (pthread_self()) is constant for all calls, and yet the issue appears
> > > > unless runtime.Gosched() is called, which seems to contradict the
> > > > previous point
> > > > - I tried multiple Go versions between 1.16 and 1.20 and multiple
> > > > macOS SDK versions between 10.13 and 13.1, all with the same problem
> > > > (and same working fix).
> > > > - only macOS is affected - on linux and Windows, there is no similar
> > > > issue (these platforms use different C code to interact with the
> > > > device).
> > > >
> > > > Does anyone have any insight into why invoking the scheduler could be
> > > > necessary here or what could be going on in general? My worry is that
> > > > using `runtime.LockOSThread()` and `runtime.Gosched()` are not proper
> > > > fixes.
> > >
> > >
> > > I didn't try to look at this in detail, but I did glance at the C code
> > > you are calling, and it uses pthread_mutex_lock and friends.  In
> > > general pthread mutexes must be unlocked by the thread that locked
> > > them, so it is quite possible that LockOSThread is required for
> > > correct operation.  I don't have an explanation for why Gosched would
> > > help, though.
> > >
> > > Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAGgfmPPxFoY0aHxLAfZOogDipx%3DYExrz1J_41Hghdw-3529WCg%40mail.gmail.com.

Reply via email to