On Fri, 8 Jul 2011 15:50:07 -0500, Joshua and Amy <josh.ruth...@gmail.com> wrote: > So, I guess I was foolish to hope that Google has figured out how to return > results that have non-identical but equivalent strings?
I'm sure google has figured this out, and some programs to an automatic conversion to composed or decomposed form. But I wouldn't be surprised if some programmer's editors, for example, don't do that (for some purposes, such as search-and-replace, the difference might be important), and maybe some other programs don't either. > I hope it's not too off-topic for this list, but can you point me to any > good resources on normalization (is there a straightforward automation for > someone who doesn't do scripting? am I supposed to use decomposed > characters?)? You can use either composed or decomposed characters for most purposes, although as I say some programs do an automatic (and possibly invisible) conversion. There's a general article on this issue here: http://en.wikipedia.org/wiki/Unicode_equivalence I know of library functions in Python that do the conversion; I'm sure they exist in Perl too. But I'm not aware of a general program (like iconv) that does it. (I think there's a hack with iconv that allows it to create decomposed forms, but that is not a bidirectional conversion.) Maybe someone else on this list knows of tools that do that. (What OS are you working on?) Mike Maxwell -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex