DM Smith wrote: > Doesn't ICU have locale sensitive decomposition (or transliteration)? > If it does then why can't we use the language of the module to set > the locale then decompose. This is what we are planning to do for > JSword (it has been on the todo list for years).
I don't see anything like this in ICU. I couldn't find anything in the API docs and there's nothing in the locale files themselves. I think our best option may be to tag words on a per module basis with alternative forms and then index the forms as alternates with Lucene, as your last post suggested. For non-Lucene searches we can normalize the text & search strings via the strip filters as Troy suggests. Someone else would have to provide the code side of things, but in terms of markup, I think we just want to do something along the lines of: <w xlit="basic:coeur">cœur</w> And the strip filter (for non-Lucene searches) will just replace that with "couer". --Chris _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page