>> i'm not sure what the hard part is.  just front the normal input function
>> with one that calls chartorune and rejects anything above codepoint 255.
>> that can't be more than 10 lines of code. [...]

> Yes, "casting" to byte can do and this is almost trivial since the input
> is buffered and handled via libweb (in kerTeX). But this will disallow
> use of TeX for non ASCII, non latin1... It seems to me better to document,
> and let user convert his files via tcs(1) to feed TeX. [...]

I found this text in TeX by Topic[1] that seems to support Quanstrom's
idea. It describes how TeX reads input, and says it's done one line at
a time (where it follows what the system defines as lines) and then for
each line it first removes trailing spaces; then (possibly) ads a return
to the end of the line; and then, since "computers may also differ in
the character encoding (the most common schemes are ASCII and EBCDIC),
so TeX converts the characters that are read from the file to its own
character codes. These codes are then used exclusively [...]"

So, it seems it's expected that encoding specific transformation is
applied to TeX input. Removing trailing spaces, at least, can't be done
without understanding utf-8.

(I warn, though, that I have no expertise in this subject.)

Best, MaurĂ­cio

[1] http://eijkhout.net/texbytopic/texbytopic.html. I got a ready to
use PDF at http://tex.loria.fr/general/texbytopic.pdf. What I describe
is found at section 2.2.



Reply via email to