On 2024-03-09 08:30, Zachary Santer wrote: > 'stdbuf --output=L' will line-buffer the command's output stream. > Pretty useful, but that's looking for newlines. Filenames should be > passed between utilities in a null-terminated fashion, because the > null byte is the only byte that can't appear within one. > > If I want to buffer output data on null bytes, the closest I can get > is 'stdbuf --output=0', which doesn't buffer at all. This is pretty > inefficient. > > 0 means unbuffered, and Z is already taken for, I guess, zebibytes. > --output=N, then? > > Would this require a change to libc implementations, or is it possible now?
Yes, because stdbuf changes stdio stream buffering modes. There is no null byte flush mode in standard C, nor as a GNU extension. The null byte flush mode idea is interesting, separately from whether it is controlled by stdbuf. I would say that if it is implemented, the programs which require it should all make provisions to set it up themselves. stdbuf is a hack/workaround for programs that ignore the issue of buffering. Specifically, programs which send information to one of the three standard streams, such that the information is required in a timely way. Those streams become fully buffered when not connected to a terminal. Programs can have the issue for other streams, like log files that they explicitly open. stdbuf won't fix that. The main reasons for wanting messages sent without delay is so that information is available in real time, so that a user sees an important prompt on the terminal before being asked for input, or so that a log message is flushed before a crash occurs. Or so that log messages from multiple sources are "chronologically clustered" with a decent granularity that they can be correlated. There can be a performance issue also, though! Suppose we run "find" to find certain files over a large file tree. It finds only a small number of files: all the file paths identified fit into a single buffer, which is not flushed until the program terminates (when sent to a pipe). We pipe this to some program which does some processing on those files. We would like the processing to start as soon as the first file has been identified, not when find is done! It could be that find discovers all the relevant files early in its execution and then spends a minute finding nothing else. That minute is added to the processing time of the files that were found. That is the compelling reason for wanting file names to be flushed individually, whether they are newline terminated or null terminated.