On Thu, Oct 16, 2008 at 05:40:51PM +0200, Georg Baum wrote:
> Andre Poenitz wrote:
> 
> > Or slurp in the contents of the .tex file and try various encodings
> > until we find one that "does the trick", possibly after cutting it
> > into parts for which we know that the encoding stays constant.
> 
> Yes. Note that this can become quite tricky, though: Some variable width
> encodings (shift-jis and big5 are examples supported by the CJK package)
> are not as nice as utf8, and do not guarantee that the second or third byte
> of a code point does not contain any byte from the ASCII range. That means
> that you cannot preparse the file (for cutting it into pieces) without
> knowing the encoding of the currently read piece.

I guess we can come up with a mechanism that puts a file's contents in a
docstring. If that needs a pretty specialized parser, well, so be it.
It's a neatly isolated problem to solve, and if solved, we should be
mostly free on further encoding issues on the input side...

Andre'

Reply via email to