On Thu, Oct 16, 2008 at 05:40:51PM +0200, Georg Baum wrote: > Andre Poenitz wrote: > > > Or slurp in the contents of the .tex file and try various encodings > > until we find one that "does the trick", possibly after cutting it > > into parts for which we know that the encoding stays constant. > > Yes. Note that this can become quite tricky, though: Some variable width > encodings (shift-jis and big5 are examples supported by the CJK package) > are not as nice as utf8, and do not guarantee that the second or third byte > of a code point does not contain any byte from the ASCII range. That means > that you cannot preparse the file (for cutting it into pieces) without > knowing the encoding of the currently read piece.
I guess we can come up with a mechanism that puts a file's contents in a docstring. If that needs a pretty specialized parser, well, so be it. It's a neatly isolated problem to solve, and if solved, we should be mostly free on further encoding issues on the input side... Andre'