Leandro Guimarães Faria Corcete DUTRA wrote: > Chris Little <[EMAIL PROTECTED]> writes: > >> We could change oe to oe-ligature where appropriate in Louis Segond. >> That would be simple enough since editions exist online that use >> oe-ligature correctly. > > Also, it is not that many words using that… cœur, sœur, mœur… > > Is there anyone to do it already, or should I do it?
WikiSource already has a copy with oe-lig that we could use. No need to repeat the work. >> However, since we won't be doing language-specific search tweaks > > That is not what I meant — I mean a general fix, where ligatures at > the search box would find expanded characters, and vice‐versa. Just like > Google does it, with all kind of European ligatures. There's a simplistic solution for searching like you suggest by decomposing ligatures as their components as part of the strip filter process. That will work fine for French, I suppose, and Latin but it would return incorrect results in other languages. In Norwegian, ae-ligature is a letter on its own, not related to a or e. In Swedish the same letter is written as a-umlaut. In Icelandic, oe-ligature shouldn't be decomposed to oe either. Should umlauted letters be decomposed also? So a-umlaut becomes ae, o-umlaut becomes oe, u-umlaut becomes ue--which works fine for German, but I doubt for many other languages. And what about i-umlaut and e-umlaut? And what about letters with accents? Some languages would simply drop the accent, others would double the letter, and there may be other behaviors I don't know about. The only ligatures that we could safely decompose without reference to language are typographic ligatures, and we would never encode those as ligatures in the first place. I don't know how Google does what they do. They may do language identification and language-specific processing of documents. But they have a lot more data and horsepower at their disposal than we do. > In the end it is an Unicode question, I guess? It's not a Unicode question because Unicode doesn't deal with this issue. The decomposition of oe-ligature to oe would be a language-specific detail and is not encoded in any of Unicode's data sets. >> since oe-ligature basicallly can't be typed on French keyboards > > Yes, but regardless of keyboards us GNU/Linux users who love > typography (admittedly a small subset) have it mapped and used it quite often. I'm understandably more concerned with Windows users who would lose functionality. --Chris _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page