On Fri, Jun 24, 2011 at 11:05:23PM +0000, Mauricio CA wrote:
> 
> I found this text in TeX by Topic[1] that seems to support Quanstrom's
> idea. It describes how TeX reads input, and says it's done one line at
> a time (where it follows what the system defines as lines) and then for
> each line it first removes trailing spaces; then (possibly) ads a return
> to the end of the line; and then, since "computers may also differ in
> the character encoding (the most common schemes are ASCII and EBCDIC),
> so TeX converts the characters that are read from the file to its own
> character codes. These codes are then used exclusively [...]"

This is simply and extract of what is explained, partly in the
TeXbook, and in TeX: the program, 2 volumes of the 5 D.E. Knuth'
series on computer typesetting.

The initial exchange between characters is, shall we say, on the
"system" level. But it is, in the code, limited to the ASCII (7 bits)
range (and even if virtex(1) is almost the bare metal, it can be only
bootstrapped by ASCII macro commands); and furthermore, TeX is "8
bits clean", that is only using, for "text", 8 bits for input...
and as CID for fonts.

The exchange is defined at compilation time, but can also be remapped
via macro-commands.

So casting utf in 8 bits is:
        - useless for ASCII (by definition);
        - will work only for latin1 input.

Extending TeX to wydes (runes) will be relatively easy superficially for
input and output (because D.E.K. has organized the code so that these
parts can be easily changed), but will not work with TeX fonts: all the
fonts machinery has to be changed.

Furthermore, this will not work, as is, with all the Unicode
range, since TeX is "left-to-right" (but what is fundamental is that,
all in all, with the exception perhaps of Frege's ideography, all
languages seem to be linear; so a switch in TeX for width and height of
the boxes computed, and hints for dvi drivers to flip/mirror can achieve
the task). So this also is to be adapted (hence the suggestion for
XeTeX).

So for now, TeX is kept 8 bits. I make no assumption for the encoding
(and user has to feed "8 bits encoding" to TeX; ASCII users have nothing
to change; others, if they want to use directly another 8 bits encoding
(ex.: directly accented letters latin1 code) have to tcs(1) the file
first.

What I will change is only on the fonts available.

For historical reasons, the fonts derived from the PostScript standard
ones were in "EC" encoding, aka Cork, mapping mainly latin1 characters 
in the 128-255 in not the latin1 encoding (because it was defined in
1990).

A macro set shall install its own expected fonts.

KerTeX shall be usable to full (relatively to its present state) extent
with the KerTeX provided data, here fonts. And to avoid providing non
D.E.K.'s fonts with the same (cryptic) names as the ones commonly found
in other TeX distributions, the kerTeX ones will use a Unix feature:
directory hierarchy, to explain the dependencies: not an initial letter
for the font forgery, but a subdirectory: adobe/ etc.

This does not prevent anyone from generating other flavours, especially
because by looking to the dir layout and to the conf/KERTEX.post-install
Bourne shell script, everything is shown and explained.
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Reply via email to