POSIX defines a text file as: 3.397 Text File A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
Notice there's no mention of ASCII, so bytes 0x80 to 0xFF are valid. For sbase we want UTF-8 support. Should we assume/enforce only valid UTF-8? Doing so makes a lot of coding easier and less sucky, but means that some POSIX text files will not be sbase text files when we run into the aforementioned bytes. In this case what's more important? Strict POSIX compliance? Or code that sucks less? -emg