Re: [XeTeX] Table of contents

Michiel Kamermans Fri, 30 Apr 2010 03:34:32 -0700

On 4/30/2010 1:56 AM, José Carlos Santos wrote:

Since no sophisticated solution appeared (or occurred to me), I shalldo that. But I think it is a flaw.

You don't need a sophisticated solution when the simple one is the onlycorrect one. In the old days we had to worry about which character setwe were using because different sets gave you different characters atidentical codepoints. In the old, dark days of not-a-lot-of-memory and"there are other languages?" land, resolving chr(128) could give you theeuro symbol, or it might be the first byte in a multi byte japanesecharacter, and - and this is the important part - the file you werereading wouldn't in any way indicate wtf you were supposed to do.

With more modern computing technology available, and a realisation thatthere are such things as other countries and writing systems, thisbecame ridiculous, and we came up with a new way of doing things:unicode, mirrored by the ISO/IEC 10646 standard. This conventionspecifies a huge character map with every distinct character-entity inits own spot, and specifies various ways in which you can encode theparts of that map that you actually use efficiently in a file, so thaton average storiung unicode data takes up an irrelevantly small amountmore diskspace than storing it using the crooked concept of codepages.More importantly, data stored in unicode can TELL you that it's storedin unicode. With that, the world finally realised how much better thingswere if the data actually indicated how to decode it, instead of havingpeople go "okay but... what the heck codepage is this text file actuallyin?".

The whole reason XeTeX exists is because there was no real unicodeawareness in the various flavours of TeX until Jonathan Kew startedmaking one: the lack of indicating an encoding in XeTeX is not a flaw,but represents a victory for people who actually want to write thingsproperly: this is what we should have had in the first place, if therehadn't been all those early computing technological restrictions. Itfinally let us all write in every language imaginable without having toworry about whether or not that letter we wanted was in the code page wewere using, and if it wasn't, how to construct what should be anarbitrarily simple character using lots of TeX code to combine lettersand symbols in ways that only worked for that one font we were actuallyusing. By going with unicode, XeTeX made, and still makes, thingsintuitively easy. You write your text, XeTeX compiles what you wrote,and you are not bothered by trying to figure out whether or not thecharacter you want is in the codepage you're using.

Of course, note that this is is very different from needing to verifythat the character you want is in the font you are using. codepage tellyou which characters even exist as far as the computer is concerned.Need a lambda symbol when you're writing something n cp1252? Tough, itdoesn't exist. Not just "in the font you are using", it simply doesn'texist until you change the codepage for your entire data context tosomething else.

Codepages are a thing from a dark past, when typesetting was severelyimpaired by fonts simply not being big enough to actually contain allthe letters people might need, and there not being a well definedcodepoint mapping for glyphs (what you see with your eyes) andcharacters (what the thing you're seeing actually represents).

At this point in time (finally, one might add) only old operatingsystems still really care about codepages - the rest have moved on toembrace a world where it doesn't matter what language you write in,because letters from one language are no longer mutually exclusive withanother. In the TeX world, too, there's great efforts being made toditch the antique concept of codepages, with XeTeX and LuaTex constantlyimproving.

If you want to typeset things nicely, and you actually care about thelanguage you're using - you're using French, so you really should care-don't use cp1252; ANSI is the *AMERICAN* standard for an 8-bit characterset. Codepages were invented to overcome the problem of only having 256spots for letters. Unicode solved that problem. Why make XeTeX use asolution for a problem that doesn't exist anymore?


- Mike


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] Table of contents

Reply via email to