Dear Gophers

I am running into a a weird error in C code that only materializes
when calling it from Go using cgo. I hope you could help me understand
this behaviour and why calling runtime.Gosched() seems to resolve the
issue. I worry that this may not be the right fix, so I want to
understand what is going on.
I communicate to a USB HID device and get an error every once in a
while, only when using cgo, in around 5% of the messages sent to the
device, non-deterministically. The messages sent and received are all
the same and work most of the time.

The C functions I am calling are hid_write and hid_read from here:
https://github.com/libusb/hidapi/blob/4ebce6b5059b086d05ca7e091ce04a5fd08ac3ac/mac/hid.c

If I call hid_write and hid_read in a loop in a C program, everything
works as expected:

```
#include "hidapi.h"
#include "hidapi_darwin.h"

int run(hid_device *handle) {
    int res;

    const uint8_t write[64] = "<some 64 byte msg>";
    const uint8_t expected[64] = "<expected 64 byte response>";
    uint8_t read[64] = {0};

while(1) {
        res = hid_write(handle, write, sizeof(write));
        if(res < 0) return -1;
        res = hid_read(handle, read, sizeof(read));
        if(res < 0) return -1;
        if (memcmp(read, expected, 64)) return -1;
}

return 0;
}
```

I ported the above to Go using cgo like below. I link `hid.o`, the
same object file used in the C program, to make sure the same code is
running to rule out differences in compilation:

```
package main

import (
    "bytes"
    "encoding/hex"
)

/*
#cgo darwin LDFLAGS: -framework IOKit -framework CoreFoundation
-framework AppKit hid.o
#include "hidapi.h"
#include "hidapi_darwin.h"
*/
import "C"

func main() {
    dev := C.hid_open(...)
    if dev == nil {
        panic("no device")
    }
    write := []byte("<some 64 byte msg>")
    expected := []byte("<expected 64 bytes response>")
    b := make([]byte, 64)
    for {
        written := C.hid_write(dev, (*C.uchar)(&write[0]), C.size_t(len(write)))
        if written < 0 {
            panic("error write")
        }
        read := C.hid_read(dev, (*C.uchar)(&b[0]), C.size_t(len(b)))
        if read < 0 {
            panic("error read")
        }
        if !bytes.Equal(b, expected) {
            panic("not equal")
        }
    }
}
```

The Go program errors on hid_write with a "generic error" error code
non-deterministically in about 5% of the messages sent. The USB device
receives the message and responds normally however.

I randomly tried adding `runtime.Gosched()` at the bottom of the
Go-loop, and the issue disappeared completely. I am very confused why
that would help, as the Go program has, as far as I can tell, no
threads and no goroutines (except main).

Other things I have checked:

- If I move the loop from Go to C by calling `C.run(dev)` (the looping
function defined above) from Go, there is no issue.
- LockOSThread: if the loop runs in a goroutine and that goroutine
switches OS threads, the issue reappears after some time (not
immediately after a thread switch - the error happens
non-deterministically) even if `runtime.Gosched()` is called.
`runtime.LockOSThread()` is needed to fix it in that case. Since the
goroutine is locked to an OS thread during the execution of a C
function call anyway, this indicates that either hidapi or the macOS
HID functions rely on thread-local vars across multiple C calls in
some way, which seems a bit crazy.
- In the above code (no goroutines), I checked that the OS thread ID
(pthread_self()) is constant for all calls, and yet the issue appears
unless runtime.Gosched() is called, which seems to contradict the
previous point
- I tried multiple Go versions between 1.16 and 1.20 and multiple
macOS SDK versions between 10.13 and 13.1, all with the same problem
(and same working fix).
- only macOS is affected - on linux and Windows, there is no similar
issue (these platforms use different C code to interact with the
device).

Does anyone have any insight into why invoking the scheduler could be
necessary here or what could be going on in general? My worry is that
using `runtime.LockOSThread()` and `runtime.Gosched()` are not proper
fixes.

Best,
Marko

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAGgfmPPpi9SXtuKn%3DBmVzM%3D8_%3Dc6-F7G-k41wyUGEQ9UBHs-rQ%40mail.gmail.com.

Reply via email to