On Wed, Feb 15, 2023 at 8:42 AM 'Marko Bencun' via golang-nuts
<golang-nuts@googlegroups.com> wrote:
>
> I am running into a a weird error in C code that only materializes
> when calling it from Go using cgo. I hope you could help me understand
> this behaviour and why calling runtime.Gosched() seems to resolve the
> issue. I worry that this may not be the right fix, so I want to
> understand what is going on.
>
> I communicate to a USB HID device and get an error every once in a
> while, only when using cgo, in around 5% of the messages sent to the
> device, non-deterministically. The messages sent and received are all
> the same and work most of the time.
>
> The C functions I am calling are hid_write and hid_read from here:
> https://github.com/libusb/hidapi/blob/4ebce6b5059b086d05ca7e091ce04a5fd08ac3ac/mac/hid.c
>
> If I call hid_write and hid_read in a loop in a C program, everything
> works as expected:
>
> ```
> #include "hidapi.h"
> #include "hidapi_darwin.h"
>
> int run(hid_device *handle) {
>     int res;
>
>     const uint8_t write[64] = "<some 64 byte msg>";
>     const uint8_t expected[64] = "<expected 64 byte response>";
>     uint8_t read[64] = {0};
>
> while(1) {
>         res = hid_write(handle, write, sizeof(write));
>         if(res < 0) return -1;
>         res = hid_read(handle, read, sizeof(read));
>         if(res < 0) return -1;
>         if (memcmp(read, expected, 64)) return -1;
> }
>
> return 0;
> }
> ```
>
> I ported the above to Go using cgo like below. I link `hid.o`, the
> same object file used in the C program, to make sure the same code is
> running to rule out differences in compilation:
>
> ```
> package main
>
> import (
>     "bytes"
>     "encoding/hex"
> )
>
> /*
> #cgo darwin LDFLAGS: -framework IOKit -framework CoreFoundation
> -framework AppKit hid.o
> #include "hidapi.h"
> #include "hidapi_darwin.h"
> */
> import "C"
>
> func main() {
>     dev := C.hid_open(...)
>     if dev == nil {
>         panic("no device")
>     }
>     write := []byte("<some 64 byte msg>")
>     expected := []byte("<expected 64 bytes response>")
>     b := make([]byte, 64)
>     for {
>         written := C.hid_write(dev, (*C.uchar)(&write[0]), 
> C.size_t(len(write)))
>         if written < 0 {
>             panic("error write")
>         }
>         read := C.hid_read(dev, (*C.uchar)(&b[0]), C.size_t(len(b)))
>         if read < 0 {
>             panic("error read")
>         }
>         if !bytes.Equal(b, expected) {
>             panic("not equal")
>         }
>     }
> }
> ```
>
> The Go program errors on hid_write with a "generic error" error code
> non-deterministically in about 5% of the messages sent. The USB device
> receives the message and responds normally however.
>
> I randomly tried adding `runtime.Gosched()` at the bottom of the
> Go-loop, and the issue disappeared completely. I am very confused why
> that would help, as the Go program has, as far as I can tell, no
> threads and no goroutines (except main).
>
> Other things I have checked:
>
> - If I move the loop from Go to C by calling `C.run(dev)` (the looping
> function defined above) from Go, there is no issue.
> - LockOSThread: if the loop runs in a goroutine and that goroutine
> switches OS threads, the issue reappears after some time (not
> immediately after a thread switch - the error happens
> non-deterministically) even if `runtime.Gosched()` is called.
> `runtime.LockOSThread()` is needed to fix it in that case. Since the
> goroutine is locked to an OS thread during the execution of a C
> function call anyway, this indicates that either hidapi or the macOS
> HID functions rely on thread-local vars across multiple C calls in
> some way, which seems a bit crazy.
> - In the above code (no goroutines), I checked that the OS thread ID
> (pthread_self()) is constant for all calls, and yet the issue appears
> unless runtime.Gosched() is called, which seems to contradict the
> previous point
> - I tried multiple Go versions between 1.16 and 1.20 and multiple
> macOS SDK versions between 10.13 and 13.1, all with the same problem
> (and same working fix).
> - only macOS is affected - on linux and Windows, there is no similar
> issue (these platforms use different C code to interact with the
> device).
>
> Does anyone have any insight into why invoking the scheduler could be
> necessary here or what could be going on in general? My worry is that
> using `runtime.LockOSThread()` and `runtime.Gosched()` are not proper
> fixes.


I didn't try to look at this in detail, but I did glance at the C code
you are calling, and it uses pthread_mutex_lock and friends.  In
general pthread mutexes must be unlocked by the thread that locked
them, so it is quite possible that LockOSThread is required for
correct operation.  I don't have an explanation for why Gosched would
help, though.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcXYYvAYzDzhduz-dtWP0Xqgo24DQMzp4zMhDRYp6YvH8Q%40mail.gmail.com.

Reply via email to