Hi, > Well, no. For example, Python source code is not typically concatenated, > nor is source code in any other language.
We did it with C++ files in order to have only one compilation unit to accelarate compilation time over network. Also, all the languages with some "include" directive will have to take care of it. I guess a unicode aware C pre-compiler already does. > As for the "super-cat": there is actually no problem with putting U+FFFE > in the middle of some document - applications are supposed to filter it > out. The precise processing instructions in the Unicode standard vary > from Unicode version to Unicode version, but essentially, you are > supposed to ignore the BOM if you see it. Ok. I'm re-assured. > A Unicode string is a sequence of integers. The numbers are typically > represented as base-2, but the details depend on the C compiler. > It is specifically *not* UTF-16, big or little endian (i.e. a single > number is *not* a sequence of bytes). It may be UCS-2 or UCS-4, > depending on a compile-time choice (which can be determined by looking > at sys.maxunicode, which in turn can be either 65535 or 1114111). > > The programming interface to the individual characters is formed by > the unichr and ord builtin functions, which expect and return integers > between 0 and sys.maxunicode. Ok. I guess that Python gives the flexibility of being configurable (when compiling Python) to internally represent unicode strings as fixed 2 or 4 bytes per characters (UCS). Thank you Francis Girard -- http://mail.python.org/mailman/listinfo/python-list