[Lldb-commits] [lldb] [lldb] Multithreading lldb-server works on Windows now; fixed gdb port mapping (PR #100670)

Pavel Labath via lldb-commits Mon, 29 Jul 2024 02:09:25 -0700

================
@@ -324,6 +324,18 @@ Status PipePosix::ReadWithTimeout(void *buf, size_t size,
         bytes_read += result;
         if (bytes_read == size || result == 0)
           break;
+
+        // This is the workaround for the following bug in Linux multithreading
+        // select() https://bugzilla.kernel.org/show_bug.cgi?id=546
+        // ReadWithTimeout() with a non-zero timeout is used only to
+        // read the port number from the gdbserver pipe
+        // in GDBRemoteCommunication::StartDebugserverProcess().
+        // The port number may be "1024\0".."65535\0".
+        if (timeout.count() > 0 && size == 6 && bytes_read == 5 &&
+            static_cast<char *>(buf)[4] == '\0') {
+          break;
+        }
----------------
labath wrote:


I'm sorry, but I find this extremely hard to believe. Pipes and the ability to 
read them until EOF are as old as UNIX. We're not doing anything special here. 
It's going to take a lot more than a reference to a 20 year old `REJECTED 
INVALID` bug to convince me this is a kernel issue.

Also note that the situation mentioned in that bug is different from what (I 
think) you're describing here. In their case, the pipe is NOT closed on the 
*other* side. The pipe is closed on the side that's doing the `select`ing:

```
Consider a multithreaded program (pthreads). One thread reads from
a file descriptor (typically a TCP or UDP socket). At some time,
another thread decides to close(2) the file descriptor while the
first thread is blocked, waiting for input data.
```

In other words, this is your basic race in the application code, and it's the 
applications (ours) responsibility to fix it. While I'm not a kernel 
maintainer, I think I have a pretty good idea why they said the application is 
buggy, and why they didn't want to fix it -- it's because fixing it probably 
will not make the application correct.

The problem there is that the application has calls (`select` (or friends) and 
`close`) on two threads with no synchronization between them. Now if select 
happens to run first, then it's not completely unreasonable to expect that 
`close` will terminate that `select`, and the kernel could in theory make sure 
it does that (apparently, some operating systems do just that). The problem is 
what happens if `select` does not run first. What if `close` does ? In this 
case, `select` will return an error (as the bug reporter expects), but 
***only*** if the FD hasn't been reused in the mean time. And since linux 
always assigns the lowest FD available (I think POSIX mandates that), it's very 
likely that the very next operation (perhaps on a third thread) which creates 
an fd will get the same FD as we've just closed. If that happens, then the 
select will NOT return an error (the kernel has no way to know that it's 
referring to the old fd) and will happily start listening on the new fd. Since 
you usually aren't able to control all operations that could possibly create a 
new FD, this kind of pattern would be buggy except in extremely limited 
circumstances.

https://github.com/llvm/llvm-project/pull/100670
_______________________________________________
lldb-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [lldb] [lldb] Multithreading lldb-server works on Windows now; fixed gdb port mapping (PR #100670)

Reply via email to