-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [Adding the Austin Group]
According to Bruno Haible on 3/6/2008 3:46 PM: | Do you know the wording that the newest POSIX has about this? I think that Interp 002 is incomplete in the face of ungetc. http://www.opengroup.org/austin/interps/uploads/40/6806/AI-002.txt It looks like the intent of POSIX is to specify exactly what happens to the underlying file description when a stream is flushed, particularly since other processes can observe the results when the file description is duplicated across process boundaries. | | The following test program, run on various platforms, gives unconclusive | results. | | ========================== foo.c ========================= | #include <stdio.h> | int | main (int argc, char **argv) | { | /* Check that fflush after a non-backup ungetc() call discards the ungetc | buffer. */ | int c; | | c = fgetc (stdin); | printf ("c = '%c'\n", c); | | c = fgetc (stdin); | printf ("c = '%c'\n", c); | | c = ungetc ('@', stdin); | printf ("ungetc result = '%c'\n", c); | | fflush (stdin); | | c = fgetc (stdin); | printf ("c = '%c'\n", c); | | c = fgetc (stdin); | printf ("c = '%c'\n", c); | | return 0; | } | ============================================================= | | $ gcc foo.c | $ ./a.out < foo.c | $ cat foo.c | ./a.out | | On glibc-2.3.6: Different results. | When reading from the regular file: | c = '#' | c = 'i' | ungetc result = '@' | c = 'n' | c = 'c' Or in other words, fflush changed the stream offset from 1 to 2, discarding the ungetc data. | When reading from the pipe: | c = '#' | c = 'i' | ungetc result = '@' | c = '@' | c = 'n' Or in other words, fflush failed, leaving the ungetc data intact. | | On MacOS X: twice | c = '#' | c = 'i' | ungetc result = '@' | c = '@' | c = 'n' Or in other words, regardless of whether fflush succeeds, ungetc data was left intact. | | On HP-UX 11: | When reading from the regular file: | c = '#' | c = 'i' | ungetc result = '@' | c = 'i' | c = 'n' Or in other words, the stream position was left intact at 1, but the ungetc data was lost. | When reading from the pipe: | c = '#' | c = 'i' | ungetc result = '@' | c = <EOF> | c = <EOF> Bug. C99 is quite clear that implementations shall provide at least one byte of ungetc buffering for all streams, and that it cannot fail if there was a prior fgetc. Is the intent of Interp 002 to make this behavior portable? Or are we resigned to documenting that fflush after ungetc produces unspecified results? Next, consider this example: $ echo 'm4exit-hello' > file $ ( m4; cat ) < file According to POSIX, m4 MUST leave the underlying file at the next unprocessed byte, such that cat must pick up where m4 left off. So I claim this MUST print "-hello". Now, when implementing m4, you MUST read the byte '-' from the file, to decide the user is invoking "m4exit" or "m4exit(1)", for example. But if m4 is implemented with streams, it is much simpler conceptually to call ungetc('-', stdin) when it is determined that input is not '(', dispatch to the 'm4exit' handler, and then rely on the auto-fflush() behavior of exit(). In this case, it then makes more sense for fflush() to preserve the current stream offset, rather than the offset that was present prior to the ungetc(). So for seekable input, this would argue against glibc's implementation, and for either MacOS or HP-UX. Without these semantics, m4 would have to resort to an explicit fseek to the current stream position, since the auto-fflush of exit() would leave the underlying file description at the wrong offset. (At any rate, GNU m4 already has to do some explicit operations in an atexit hook on glibc systems, where the exit() behavior violates POSIX because the fflush is not automatic, but that is besides the point of this discussion). The remaining question is what to do about the ungetc buffer from a single-process standpoint. The above example of interprocess behavior pushed back what was read. But if a different byte is pushed back, POSIX is clear that the underlying file is not altered to contain that new byte. ~ Therefore, in interprocess communications, the second process would read the original byte (as was done in HP-UX) rather than the ungetc buffer (as was done in MacOS). But for a single process, the wording for ungetc is clear that only a successful file positioning function discards the ungetc buffer, and does not list fflush as a file positioning function (and while fflush on a write stream may change the position as data is flushed, nothing in fflush, not even with Interp 002, mentions changing the stream offset for read streams). So here, it seems like MacOS (or glibc's pipe) behavior is better. Furthermore, if fflush leaves the ungetc buffer intact, you can still do fseek(stdin,0,SEEK_CUR) to reread the real file contents rather than the ungetc buffer. My concern is that this statement, added by Interp 002, is ambiguous: "the file offset of the underlying open file description shall be adjusted so that the next operation on the open file description deals with the byte after the last one read". Does reading the ungetc buffer count as an operation on the open file description, or is the next operation on the open file description deferred until after the ungetc buffer is exhausted? ~ In other words, when calling fflush immediately after ungetc, is the offset of the file description set to the current stream position (in the example above, 1, as in MacOS) or to the stream position where the ungetc buffer ends (2, as in glibc)? - -- Don't work too hard, make some time for fun as well! Eric Blake [EMAIL PROTECTED] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH0Nl484KuGfSFAYARAoEpAKCrW8pm2LXtkuMope2lUe8aanxaSgCgofxC iexFnfiVRlgIZ58EfBCez7w= =7wN/ -----END PGP SIGNATURE-----