DM Smith wrote: > Chris Little wrote: >> MorphGNT and an updated Tisch, both from morphgnt.org are up in the beta >> area. >> > Both of these modules use composed UTF-8 characters. > > In April 2005 we had a discussion on whether Greek should be composed or > decomposed. I don't remember coming to a resolution. Are we going with > composed?
I don't know. The source texts came pre-composed, and I thought about whether I should normalize them differently, but decided to just stick with the easiest path (the do-nothing path) to completion. > To summarize, some frontends (including different browers viewing the > Bible Tool) handled composed better than decomposed. Others did the > opposite. Font choice had significant impact on the results. > > It was noted that we could have filters for composition or decomposition > to transform as the frontend needed. Yeah, we already have NFC & NFKD filters. Maybe we should add NFD? In any case, they require ICU. > If we allow for modules to vary with regard to this, could/should we > have an entry in the conf indicating the normalization? Perhaps with the > values from NFC, NFD, NFKD, NFKC, FCD? If we allow variation, yes. But I would suggest we just pick a normalization (NFD or NFC) and stick with it for all modules. > Should osis2mod do normalization to an agreed upon normalization? That wouldn't be a bad idea, but it would require ICU. > How should a Greek (or any other accented text) be indexed with Lucene. > Should we index various representations: Fully (de)composed, > un-accented, transliterated? > > It seems that the frontend needs to know how the index is represented so > that it can appropriately normalize user input. > > Right now Lucene indexes what it is handed and the user is responsible > for matching that. That I can't answer, but I would probably index whatever we standardize on plus the unaccented version of the same. --Chris _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page