Re: [sword-devel] Does the CLucene indexing work for non-English texts?

2018-11-01 Thread Nic Carter
PocketSword uses the standard SWORD library search implementation, using CLucene. Last I looked, the C version is a _long_ way behind the Java version (Lucene). The C version seemed to stop being developed after it worked well enough for English text and didn’t seem to get any love for other lan

Re: [sword-devel] Search error in PocketSword for ancient Greek text

2018-11-01 Thread DM Smith
The SimpleAnalyzer does not “fold” these to the same value. If the text has a mix of upper and lower Greek, it uses latin to_upper to convert, which won’t. For JSword, we pick an analyzer based upon the text’s script/language. In Him, DM > On Nov 1, 2018, at 5:21 PM, Tom Sullivan wrote

Re: [sword-devel] Does the CLucene indexing work for non-English texts?

2018-11-01 Thread DM Smith
From memory, SWORD uses SimpleAnalyzer. This analyzer works well for Western European languages. It won’t for non-latinate texts. It may work in part. The basic rule of thumb is that both the index has to be created with an analyzer and the search request has to be analyzed the same. PocketSwor

Re: [sword-devel] Search error in PocketSword for ancient Greek text

2018-11-01 Thread Tom Sullivan
On my keyboard, the two sigmas, are two different keys. They also have different UTF-8 numbers. It seems that this could be a host program problem to fix. Tom Sullivan i...@beforgiven.info FAX: 815-301-2835 - Great News! God created you, owns you and gave you commands to obe

[sword-devel] Search error in PocketSword for ancient Greek text

2018-11-01 Thread TS
Hi, In the past, when I have tried searching Greek text in PocketSword, it causes an error. I seem to recall, that it went something like this. If I put the full Greek word in, it won't find the word, but if I enter part of the Greek letters of a word in, then it will find the word. It seem

[sword-devel] Does the CLucene indexing work for non-English texts?

2018-11-01 Thread TS
Does the CLucene indexing work for non-English texts? David's recent question about languages without spaces caused me to be a bit curious about this matter. Briefly looking at the current Apache Lucene code, their appears to be extra code for non-English text. However, this is in comparison to