I believe the problem you have is that your readFull can return a partial read if an error occurs e.g. timeout so when that happens you lose data by overwriting the partial result with the next read.
If you apply something like the following, which returns the bytes read and continues the read from that offset you should be good: --- main.go.timeout-race 2024-11-18 14:07:03 +++ main.go 2024-11-18 17:24:09 @@ -25,7 +25,7 @@ func service(conn net.Conn) { func service(conn net.Conn) { cliGoroNumBytes := make([]byte, 8) - err := readFull(conn, cliGoroNumBytes, nil) + _, err := readFull(conn, cliGoroNumBytes, nil) if err != nil { panic(err) } @@ -115,8 +115,10 @@ func client(id int) { // need to periodically check for other events, e.g. shutdown, pause, etc. // // When we do so, we observe data loss. + var offset int for { - err := readFull(conn, buff, &timeout) + n, err := readFull(conn, buff[offset:], &timeout) + offset += n if err != nil { //fmt.Printf("err = '%v'; current i=%v; prev j=%v\n", err, i, j) r := err.Error() @@ -153,8 +155,7 @@ func readFull(conn net.Conn, buf []byte, timeout *time var zeroTime = time.Time{} // readFull reads exactly len(buf) bytes from conn -func readFull(conn net.Conn, buf []byte, timeout *time.Duration) error { - +func readFull(conn net.Conn, buf []byte, timeout *time.Duration) (int, error) { if timeout != nil && *timeout > 0 { conn.SetReadDeadline(time.Now().Add(*timeout)) } else { @@ -172,13 +173,13 @@ func readFull(conn net.Conn, buf []byte, timeout *time if err != nil { panic(err) } - return nil + return total, nil } if err != nil { - return err + return total, err } } - return nil + return total, nil } func startClients() { On Mon, 18 Nov 2024 at 06:01, Jason E. Aten <j.e.a...@gmail.com> wrote: > If you have a macOS or Windows computer > handy and you want to improve the TCP > read-deadline handling in Go -- and you know > something about how the runtime handles TCP connections > and read-deadline-timeouts -- your help would be welcome > in getting to the bottom of this runtime bug I recently > encountered: > > https://github.com/golang/go/issues/70395 > > I have reliable reproducers for it shown there > on the ticket (also see below). > > I can see that 8 or 12 bytes are being > skipped and data lost from the TCP receive buffers > on a socket read that occurs at some point after one > or more read deadline time outs. (On windows I can > see the strange count of 15 bytes being lost.) > > It is clearly racy since the lost data occurs only > after several thousand good reads. I actually > don't think the "loaded" part is necessary after all; > I just think you need alot of reads in order to > encounter a bad one. Thus I have 50 TCP clients > reading at once from a TCP server. Generally > I see a bad read within a few seconds, but it > can take a minute sometimes too. > > This happens on darwin and on windows, but not > on linux. Reproducer code in one file is here: > > wget > https://github.com/glycerine/rpc25519/blob/master/attic/darwin_word_shift.go > > Just `go run darwin_word_shift.go` to reproduce it: if > it panics, you have seen a bad read. > > Thanks and happy debugging! > > Jason > > > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > To view this discussion visit > https://groups.google.com/d/msgid/golang-nuts/532c611a-6d72-4d3f-88df-89b2dee15b10n%40googlegroups.com > <https://groups.google.com/d/msgid/golang-nuts/532c611a-6d72-4d3f-88df-89b2dee15b10n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/CAA38peb9o%3DGWGM2bRwD7vvJSNN7O7EwvpBK1qGSJiazaZ0W_QQ%40mail.gmail.com.