Re: Rendering Raised FULL STOP between Digits

Karl Williamson Fri, 22 Mar 2013 12:18:14 -0700

On 03/21/2013 04:48 PM, Richard Wordingham wrote:

For linguistic analysis, you need the normalisation appropriate to the
task.  This is a case where Unicode normalisation generally throws away
information (namely, how the author views the characters), whereas in
analysing Burmese you may want to ignore the order of non-interacting
medial signs even though they have canonical combining class 0.  I have
found it useful to use a fake UnicodeData.txt to perform a non-Unicode
normalisation using what were intended to be routines for performing
Unicode normalisation.  Fake decompositions are routinely added to the
standard ones when generating the default collation weights for the
Unicode Collation Algorithm - but there the results still comply with
the principle of canonical equivalence.


However, distinguishing U+00B7 and U+0387 would fail spectacularly
of the text had been converted to form NFC before you received it.

This is the first time I've heard someone suggest that one can "tailor"normalizations. Handling Greek shouldn't require having to fakeUnicodeData.txt. And writing normalization code is complex and tricky,so people use pre-written code libraries to do this. What you'resuggesting says that one can't use such a library as-is, but you wouldhave to write your own. I suppose another option is to translate allthe characters you care about into non-characters before calling thenormalization library, and then translate back afterwards, and hope thatthe library doesn't use the same non-character(s) internally.

And the question I have is under what circumstances would better resultsbe obtained by doing this normalization? I suspect that the answer isonly for backward compatibility with code written before Unicode cameinto existence. If I'm right, then it would be better for mostnormalization routines to ignore/violate the Standard, and not do thisnormalization.

Re: Rendering Raised FULL STOP between Digits

Reply via email to