> > perhaps you mean the subset of unicode corresponding to the codepoints > > encoded by latin1 encoded in utf-8. the system character set is utf-8, > > and latin1 is not a compatable encoding. utf-8 is assumed everwhere except > > when the data is inbound, and explicitly tagged as having a different > > caracter set. programs like upas/fs and webfs do the conversion at the > > border. > > > > there's really no reason for latin1 in 2011. > > There is a reason here: for now, TeX is 8 bits and that's all. So, if > allowing to use, at least, all of the 8 bits means something, it shall > be latin1. This does not prevent somebody to use whatever character set > one wants; but as a default, and _for now_, it's better than nothing; > and significantly better than some random character set that no tcs(1) > will know how to deal with. > > To accept directly utf-8 as input will not be addressed for the 1.0 > release of kerTeX.
i think you've missed my point. latin1 is an encoding, utf-8 is an encoding. if tex is so backwards that it can't accept a character wider than 8 bits, then it would be reasonable to not be different than the rest of the plan 9 system to read utf 8 runes (i.e. not latin1) in and then reject runes with a codepoint above 255. then, if tex is fixed to accept larger codepoints, one can remove this limit. if latin1 is used, then it can not be retrofitted in a way that is compatable with older tex input. nobody cares what font encoding tex uses internally. the real issue is the input to tex. i sure would be very reluctant to load anything on my system that will mangle utf-8, especially for codepoints <256. that's the path to wchar_t. - erik