> > perhaps you mean the subset of unicode corresponding to the codepoints
> > encoded by latin1 encoded in utf-8.  the system character set is utf-8,
> > and latin1 is not a compatable encoding.  utf-8 is assumed everwhere except
> > when the data is inbound, and explicitly tagged as having a different
> > caracter set.  programs like upas/fs and webfs do the conversion at the
> > border.
> > 
> > there's really no reason for latin1 in 2011.
> 
> There is a reason here: for now, TeX is 8 bits and that's all. So, if
> allowing to use, at least, all of the 8 bits means something, it shall 
> be latin1. This does not prevent somebody to use whatever character set
> one wants; but as a default, and _for now_, it's better than nothing;
> and significantly better than some random character set that no tcs(1)
> will know how to deal with.
> 
> To accept directly utf-8 as input will not be addressed for the 1.0
> release of kerTeX.

i think you've missed my point.  latin1 is an encoding,
utf-8 is an encoding.  if tex is so backwards that it can't
accept a character wider than 8 bits, then it would be reasonable
to not be different than the rest of the plan 9 system to
read utf 8 runes (i.e. not latin1) in and then reject runes
with a codepoint above 255.

then, if tex is fixed to accept larger codepoints, one can
remove this limit.  if latin1 is used, then it can not be retrofitted
in a way that is compatable with older tex input.

nobody cares what font encoding tex uses internally.  the
real issue is the input to tex.  i sure would be very reluctant
to load anything on my system that will mangle utf-8, especially
for codepoints <256.  that's the path to wchar_t.

- erik

Reply via email to