Resurrecting old bug thread because this hasn't yet been resolved. This report was about an unexpected behavior of fflush(). The below was expected to print() once per second, but instead it looks like it printed all at once after all the sleeps:
$ for i in `seq 10` ; do echo x ; sleep 1 ; done | mawk '{ print $1 ; fflush() }' | ts Aug 18 12:17:14 x Aug 18 12:17:14 x Aug 18 12:17:14 x Aug 18 12:17:14 x Aug 18 12:17:14 x Aug 18 12:17:14 x Aug 18 12:17:14 x Aug 18 12:17:14 x Aug 18 12:17:14 x Aug 18 12:17:14 x I ran a few tests, and it looks like fflush() isn't doing anything wrong, but rather the problem is that mawk buffers the INPUT in addition to the output. I did this: $ date; \ for i in `seq 5` ; do \ echo x; \ sleep 1; \ done | ~/projects/ltrace/ltrace -t -x '*@libc.so.6' -o /tmp/ltrace.out \ mawk '{ print $1; fflush() }' | ts; \ egrep 'flush' /tmp/ltrace.out | grep -v resumed Sat Oct 14 21:48:42 PDT 2017 Oct 14 21:48:47 x Oct 14 21:48:47 x Oct 14 21:48:47 x Oct 14 21:48:47 x Oct 14 21:48:47 x 21:48:47 fflush( <unfinished ...> 21:48:47 ffl...@libc.so.6( <unfinished ...> 21:48:47 fflush( <unfinished ...> 21:48:47 ffl...@libc.so.6( <unfinished ...> 21:48:47 fflush( <unfinished ...> 21:48:47 ffl...@libc.so.6( <unfinished ...> 21:48:47 fflush( <unfinished ...> 21:48:47 ffl...@libc.so.6( <unfinished ...> 21:48:47 fflush( <unfinished ...> 21:48:47 ffl...@libc.so.6( <unfinished ...> I.e. we started at 21:48:42, and I'd expect prints at :42, :43, :44, ... We don't get those. Ltrace DOES show the fflush() libc calls, but those all happen at the end, not as the data comes in. This is wrong, and implies that the problem is on the data READING end, not the WRITING end. And I can prove it even more conclusively: $ date; \ for i in `seq 5` ; do echo x ; sleep 1 ; done | mawk '{ system("date") }' Sat Oct 14 21:53:05 PDT 2017 Sat Oct 14 21:53:10 PDT 2017 Sat Oct 14 21:53:10 PDT 2017 Sat Oct 14 21:53:10 PDT 2017 Sat Oct 14 21:53:10 PDT 2017 Sat Oct 14 21:53:10 PDT 2017 I would expect mawk to read the data a line at a time and to invoke the 'date' command each time. Instead we get a bunch of dates at the end, meaning that mawk is buffering the input. Finally, looking at the syscalls I get this: $ for i in `seq 5` ; do echo x ; sleep 1 ; done | strace -f -e read,execve,clone -t mawk '{ system("date") }' 21:56:48 execve("/usr/bin/mawk", ["mawk", "{ system(\"date\") }"], [/* 40 vars */]) = 0 21:56:48 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200V\0\0\0\0\0\0"..., 832) = 832 21:56:48 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\4\2\0\0\0\0\0"..., 832) = 832 21:56:48 read(0, "x\n", 4096) = 2 21:56:48 read(0, "x\n", 4094) = 2 21:56:49 read(0, "x\n", 4092) = 2 21:56:50 read(0, "x\n", 4090) = 2 21:56:51 read(0, "x\n", 4088) = 2 21:56:52 read(0, "", 4086) = 0 21:56:53 clone(strace: Process 16988 attached child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f5b16f029d0) = 16988 [pid 16988] 21:56:53 execve("/bin/sh", ["/bin/sh", "-c", "date"], [/* 40 vars */]) = 0 [pid 16988] 21:56:53 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\4\2\0\0\0\0\0"..., 832) = 832 [pid 16988] 21:56:53 clone(strace: Process 16989 attached child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f292bae69d0) = 16989 [pid 16989] 21:56:53 execve("/bin/date", ["date"], [/* 40 vars */]) = 0 .... I.e. mawk is reading the data when it comes in (the read() syscalls are not delayed), but it must be chunking the data in userspace in some way before doing any processing. I'm guessing this is a performance optimization. When using gawk this doesn't happen, and the only buffering to worry about is the output buffering, which can be disabled with fflush(). If this extra layer of buffering isn't a bug (and I'm assuming it's not), then there really isn't anything to do, but we REALLY should add a note in the manpage, maybe next to where fflush() is documented. Because this layer of buffering breaks the thing that users like me try to achive with fflush(). Thanks!