Hi Samuel, Sep 6, 2025, 22:40 by samuel.thiba...@gnu.org:
> > >> >> this was also happening before he previous change already and looks >> >> expected (although I admit it is a bit weird)? >> >> >> > >> > If there is nothing to read, it's indeed normal to get EWOULDBLOCK on a >> > O_NONBLOCK file descriptor. >> > >> >> After a start_input with empty input_buffer and nowait dev_read returns >> >> EWOULDBLOCK as no messages are (immediately) available. >> >> >> > >> > Does the device_read_request_inband() call inside start_input() return >> > D_WOULD_BLOCK? Does device_read_reply_inband get called with errorcode >> > == D_WOULD_BLOCK? >> >> No. I don't think so. >> > > "think" is not enough to be able to debug things. > >> From what I have tried with kmsg device_read_request_inband() always >> succeeds.device_read_reply_inband gets called with errorcode== D_WOULD_BLOCK >> only when kmsg currently has no more messages. >> > > Ok, but the question is how we can manage to somehow block, waiting for > more messages. Actually I'm surprised that device_read_request_inband > somehow gets called with D_WOULD_BLOCK. It seems that start_input passes > D_NOWAIT to device_read_request_inband. That looks odd to me: we'd > rather as well just call device_read_request_inband(0), and let it > return an answer only when it has some data, so that select() can wait > for it. I guess that's the central knot of our issues. > > (and dev_read will still be able to return EWOULDBLOCK if no data was > received yet). > >> >> After that it is readable >> > >> > After what? From what I read above dev_read was returning EWOULDBLOCK. >> > >> Currently you have to trigger the first device_read_request manually with a >> read >> > > What is "you"? > >> and ignore the error. >> > > What error is that? > >> Then things work until the input_buffer is empty again. >> This is repeatedly reading with O_NONBLOCK as discussed above. > If one just starts reading without select and O_NONBLOCK the initial read > (any read when input_buffer is empty) will always EWOULDBLOCK. > Which I guess is again the same blocking question raised above. > >> >> until the "end" >> >> >> > By "end", do you mean the current end of file? >> >> Yes, device_read_reply_inband returning 0 data + D_WOULD_BLOCK when >> currently no more messages are available. >> > > Ok. And in that case we would want to just somehow block. > >> >> where it now also EWOULDBLOCKs as opposed to D_WOULD_BLOCK >> > >> > What do you mean by "as opposed?" Normally userland should only ever see >> > EWOULDBLOCK, and we would convert D_WOULD_BLOCK into EWOULDBLOCK. Where >> > do you get D_WOULD_BLOCK vs EWOULDBLOCK exactly? >> >> This is fixed by the previous patch now. >> What I was trying to explain that if device_read_reply_inband was returning >> D_WOULD_BLOCK inappropriately it would have been returned by read earlier >> than the current end of file. >> > > Ok. > >> >> (unless one exactly manages to empty the input_buffer, i.e. by reading in >> >> chunks of 1). >> > >> > Do you mean that if one call io_read() with more than 1, we might get >> > stuck in a situation where we don't manage to empty the pending >> > characters? >> >> Currently there is a problem when the input_buffer is empty. >> >> This definitely happens at the start or end, but I think could also happen >> in the middle somewhere by accident. At least I ran into problems trying to >> read 1 char at a time manually. >> >> >> >> If there was a D_WOULD_BLOCK >> >> >> > >> > Where do you mean there could be a D_WOULD_BLOCK? >> > >> >> it would have failed with D_WOULD_BLOCK much earlier than the "end". >> >> >> > >> > AIUI we wouldn't want to get any D_WOULD_BLOCK before the actual end of >> > file? >> >> Yes, i think this is exactly what is happening. I think there is no problem >> there. >> > > Ok, but above you were saying "in the middle somewhere by accident"? > >> >> Should this be before or after the pthread_hurd_cond_timedwait_np? >> > >> > By: >> > >> >> > device_read_reply_inband get called with errorcode == D_WOULD_BLOCK? >> >> > I wonder if in that case we'd be supposed to call start_input again, >> > >> > I didn't mean to add calling start_input() in io_select_common, >> > but in device_read_reply_inband(), in the case where errorcode == >> > D_WOULD_BLOCK. Because we are consuming the opportunity of reading data >> > from the device, so we'd have to trigger another one by calling >> > start_input() to make another device_read RPC call. >> >> I see, I misunderstood your message. >> >> Wouldn't this still be incomplete in the case when no read has happened yet? >> > > What would be incomplete? What problem would be posed *exactly*? > Additional start_input's in device_read_reply_inband would not help because it needs another start_input somewhere else for the first start_input/ device_read_request_inband. > The problem I can see is that if we call select() before read(), > start_input will never have been called yet. select() should probably > be made to call start_input too when SELECT_READ is requested and the > buffer is not readable. > >> What about the case when the input buffer becomes empty somewhere in the >> middle of the available messages? >> > > Why would it be empty if there is available messages? > This should not happen but I encountered hangs when repeatedly reading in chunks of 1. But this has not been a problem else. My current WIP diff: --8<---------------cut here---------------start------------->8--- diff --git a/trans/streamio.c b/trans/streamio.c index c6e7229e..d179453b 100644 --- a/trans/streamio.c +++ b/trans/streamio.c @@ -190,6 +190,8 @@ error_t dev_write (const void *buf, size_t len, size_t *amount, int nowait); will wait for any activity to cease. */ error_t dev_sync (int wait); +static error_t start_input (int nowait); +static error_t start_output (int nowait); static struct argp_option options[] = @@ -572,9 +574,9 @@ io_select_common (struct trivfs_protid *cred, available = 0; + pthread_mutex_lock (&global_lock); while (1) { - pthread_mutex_lock (&global_lock); if ((*type & SELECT_READ) && buffer_readable (input_buffer)) available |= SELECT_READ; if ((*type & SELECT_WRITE) && @@ -588,13 +590,26 @@ io_select_common (struct trivfs_protid *cred, pthread_mutex_unlock (&global_lock); return 0; } - - if (cred->po->openmodes & O_NONBLOCK) + if (*type & SELECT_READ) { - pthread_mutex_unlock (&global_lock); - return EWOULDBLOCK; + err = start_input(0); + if (err) + { + *type = 0; + pthread_mutex_unlock (&global_lock); + return err; + } + } + if ((*type & SELECT_WRITE) && (cred->po->openmodes & O_WRITE)) + { + err = start_output(0); + if (err) + { + *type = 0; + pthread_mutex_unlock (&global_lock); + return err; + } } - ports_interrupt_self_on_port_death (cred, reply); err = pthread_hurd_cond_timedwait_np (&select_alert, &global_lock, tsp); if (err) --8<---------------cut here---------------end--------------->8--- I have no idea if the write part works and I just hope that it is symmetrical. The condition in io_select_common explicitly checks for O_WRITE I am not sure if this is needed again because if it is not set select will return before. It does not work at all if the pthread_mutex_lock is inside the while block. All the timeout related things I tried seem to work as expected. Finally the shepherd's kernel log also works with this if I additionally add this to mach diff --git a/device/kmsg.c b/device/kmsg.c index e5b518e6..4efa0046 100644 --- a/device/kmsg.c +++ b/device/kmsg.c @@ -31,7 +31,7 @@ the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ #include <device/kmsg.h> -#define KMSGBUFSIZE (4096) /* XXX */ +#define KMSGBUFSIZE (8192) /* XXX */ /* Simple array for buffering messages */ static char kmsg_buffer[KMSGBUFSIZE]; to not get truncated messages (not really ideal) > Samuel > Y.