diacritics (combining characters) are a real mess in Unicode. with so much space in the format why did they have to go this route, i wonder?
erik mentioned cyrillic. i did have an old church slavonic bible text i was attempting to display correctly on Plan 9 sometime in 2003-4. top is x11 with correctly (i presume) combined characters, below is the Plan 9 rendering: http://mirtchovski.com/screenshots/x-p9-diacritics.jpg there's a pattern there, as you can see: the combining char always follows the char it's combined with, so you can try simply not advancing forward as a first draft of implementing char combinations in Plan 9. there doesn't seem to be a default list of "combining" characters in UTF so you'll have to pick up all glyphs described as "combining" and check for them when you input. fun and slow :) the real problem isn't in viewing them however, but comes when you start searching for them: it's easy to search for ë (e-umlaut) for example, but what if it's described as e+"U+0308 COMBINING DIAERESIS"? the answer is the UTS#18 Regular Expressions technical standard which probably contributes at least half of the slowness of gnu grep discussed in another thread. http://www.unicode.org/reports/tr18/