Consider using Luke to analyze the constructed Lucene index. See: 
https://code.google.com/archive/p/luke/ 
<https://code.google.com/archive/p/luke/>
I think you’ll need one that matches Lucene 1.9.1. Maybe 1.4.x.

DM


> On Apr 26, 2017, at 3:48 PM, David Haslam <dfh...@googlemail.com> wrote:
> 
> If you examine the result preview pane in the Xiphos Advanced Search dialog,
> the problem becomes apparent.
> 
> Most Coptic Unicode characters are not displayed correctly.
> 
> 
> 
> The remainder seem to have been converted to U+FFFD REPLACEMENT CHARACTER.
> 
> i.e. All these Coptic letters are basically not handled aright by this part
> of the software:
> 
> U+2C81        ⲁ       COPTIC SMALL LETTER ALFA
> U+2C83        ⲃ       COPTIC SMALL LETTER VIDA
> U+2C85        ⲅ       COPTIC SMALL LETTER GAMMA
> U+2C87        ⲇ       COPTIC SMALL LETTER DALDA
> U+2C89        ⲉ       COPTIC SMALL LETTER EIE
> U+2C8B        ⲋ       COPTIC SMALL LETTER SOU
> U+2C8D        ⲍ       COPTIC SMALL LETTER ZATA
> U+2C8F        ⲏ       COPTIC SMALL LETTER HATE
> U+2C91        ⲑ       COPTIC SMALL LETTER THETHE
> U+2C93        ⲓ       COPTIC SMALL LETTER IAUDA
> U+2C95        ⲕ       COPTIC SMALL LETTER KAPA
> U+2C97        ⲗ       COPTIC SMALL LETTER LAULA
> U+2C99        ⲙ       COPTIC SMALL LETTER MI
> U+2C9B        ⲛ       COPTIC SMALL LETTER NI
> U+2C9D        ⲝ       COPTIC SMALL LETTER KSI
> U+2C9F        ⲟ       COPTIC SMALL LETTER O
> U+2CA1        ⲡ       COPTIC SMALL LETTER PI
> U+2CA3        ⲣ       COPTIC SMALL LETTER RO
> U+2CA5        ⲥ       COPTIC SMALL LETTER SIMA
> U+2CA7        ⲧ       COPTIC SMALL LETTER TAU
> U+2CA9        ⲩ       COPTIC SMALL LETTER UA
> U+2CAB        ⲫ       COPTIC SMALL LETTER FI
> U+2CAD        ⲭ       COPTIC SMALL LETTER KHI
> U+2CAF        ⲯ       COPTIC SMALL LETTER PSI
> U+2CB1        ⲱ       COPTIC SMALL LETTER OOU
> U+2CC1        ⳁ       COPTIC SMALL LETTER SAMPI
> U+2CE8        ⳨       COPTIC SYMBOL TAU RO
> 
> Only the few Coptic letters in the block U+03E2 to U+03EF are displayed
> aright.
> 
> It's no wonder that a search has so many spurious results if most of the
> search space has been squashed into Unicode replacement characters.
> 
> I'm a Windows user, as most of you know already.
> Does the same thing happen in Xiphos under Linux?
> 
> Is this an issue common to all SWORD based front-ends?
> The fact that we see similar results in PocketSword strongly suggests it is.
> 
> Best regards,
> 
> David
> 
> 
> 
> --
> View this message in context: 
> http://sword-dev.350566.n4.nabble.com/Lucene-search-index-and-Coptic-tp4657103p4657106.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
> 
> _______________________________________________
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to