Re: [sword-devel] Chinese PinYin, OSIS, SWORD and front-ends

Chris Little Tue, 19 Oct 2010 14:22:32 -0700

On 10/19/2010 1:54 PM, Matthew Talbert wrote:

On Tue, Oct 19, 2010 at 4:19 AM, David Haslam<d.has...@ukonline.co.uk>  wrote:


Something to ponder for the future then, maybe?

See �http://crosswire.org/wiki/Talk:Transliteration
http://crosswire.org/wiki/Talk:Transliteration

Thanks, Chris, for useful comments there.


As Chris says there, it would require indexing both versions of the
module, something I don't believe is currently possible. What would be
cool (imo) is to have the transliterated text available in a different
field, much as lemma is done now. Then a search for trans:something
would access the transliterated data. Of course, it would be nice to
provide this transparently to the end user.

I'm really about as ignorant of (C)Lucene as a person can be, so someoneplease correct me if I'm wrong. I believe our indexing just indexes atthe record level (verses or dictionary entries). So, upon creation ofthe index, you could just concatenate the text and the transliteratedtext and do indexing for that. Unless you need to support exact stringmatches across record boundaries, the concatenation shouldn't affectresults.

Something I mention on the wiki, that I think you're also advocating, isdoing transliteration of the text on a word-by-word basis and placingthe result in the <w xlit="..."> attribute (all via a filter). Thatpartly depends on the sourcetype being OSIS (though we could do it toplaintext too, and change its sourcetype at runtime). We could certainlyrun such a filter process prior to indexing, which would mean that thetransliterated text could be searched, even if transliteration is turnedoff in the current view.


--Chris

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Chinese PinYin, OSIS, SWORD and front-ends

Reply via email to