Re: [sword-devel] Sword phrase search not returning all expected results

Tobias Klein Mon, 03 Mar 2025 10:53:28 -0800

Hi Troy,

Thank you so much!

I am using a GitHub "fork" of SWORD now (based oncrosswire-sword-mirror), in order to maintain a bugfix branch withoutall the latest changes.Seehttps://github.com/ezra-bible-app/crosswire-sword-mirror/tree/sword-1.9.0-bugfix.In this fork I have now cherry-picked your bugfix commit from trunk andcan confirm that it works!


This bugfix will be part of the next Ezra Bible App release.

Best regards,
Tobias

On 3/3/25 15:01, Troy A. Griffitts wrote:

Hi Tobias,

I am sorry I haven't had a change to look into this sooner.

Having a quick look, I didn't see the same issue you see:

```

[tgriffitts@fedora sword]$ cd examples/cmdline/
[tgriffitts@fedora cmdline]$ make
[tgriffitts@fedora cmdline]$ ./search KJV "generation to generation"
[0=================================50===============================100]
 ======================================================================
Exod 17:16; Isa 13:20; Isa 34:10; Isa 34:17; Isa 51:8; Jer 50:39; Lam5:19; Dan 4:3; Dan 4:34; Joel 3:20; Luke 1:50
Exod 17:16
Isa 13:20
Isa 34:10
Isa 34:17
Isa 51:8
Jer 50:39
Lam 5:19
Dan 4:3
Dan 4:34
Joel 3:20
Luke 1:50
[tgriffitts@fedora cmdline]$

```

Looking into things a bit more:

```

[tgriffitts@fedora sword]$ ./usrinst.sh

...

Configuration:

 Settings:
     LIBDIR:               /usr/lib64
     DEBUG:                yes
     PROFILE:              no
     BUILD TESTS:          yes
     BUILD EXAMPLES:       no
     BUILD UTILITIES:      yes
     STRIP LOG DEBUG:      no
     STRIP LOG INFO:       no

 Dependencies for standard use:
     REGEX:                yes
     ZLIB:                 yes
     LIBICU:               yes
     LIBCURL:              yes
     CLUCENE-CORE:         yes 2.x

 Optional / Experimental:
     LIBCURL SFTP:         yes
     BZIP2:                no
     XZ:                   no
     ICUSWORD:             no
     ICU-REGEX:            yes
     CXX11-REGEX:          no
     CXX11-TIME:           yes
     XAPIAN-CORE:          no
     GAPI:                 no

```

If I edit usrinst.sh and remove the comment on the line:
# --without-icu

Such that my ./usrinst.sh configuration report at the end:

```
     LIBICU:               no
```

Then:
```
[tgriffitts@fedora sword]$ make clean
[tgriffitts@fedora sword]$ make -j
[tgriffitts@fedora sword]$ cd examples/cmdline
[tgriffitts@fedora sword]$ make clean
[tgriffitts@fedora sword]$ make
[tgriffitts@fedora cmdline]$ ./search KJV "generation to generation"
[0=================================50===============================100]
 ======================================================================
Isa 13:20; Isa 34:10; Isa 34:17; Isa 51:8; Jer 50:39; Dan 4:3; Dan4:34; Joel 3:20; Luke 1:50
Isa 13:20
Isa 34:10
Isa 34:17
Isa 51:8
Jer 50:39
Dan 4:3
Dan 4:34
Joel 3:20
Luke 1:50
```
Bottom line, there was a bug in non-icu toupper when no maxlen waspassed. Instead of allowing the entire toupper length to be copied tothe buffer, it copied no characters from the uppercased string to thedestination and set the terminating null at the start of thedestination string.
The reason we do the toupper is because the OSISPlain filter doesdouble duty. It acts as the stripfilter to prepare the string forsearching, and it also acts as the render filter when asked to renderthe verse as plaintext. So, the toupper is to change the divineNameLord entries to LORD for plaintext output.
Thanks for finding this bug and spending time to pinpoint the placewere the problem is occurring. Great help. I appreciate you.
Troy

Patch:

Modified: trunk/src/mgr/stringmgr.cpp
===================================================================
--- trunk/src/mgr/stringmgr.cpp 2025-03-03 13:48:49 UTC (rev 3897)
+++ trunk/src/mgr/stringmgr.cpp 2025-03-03 13:49:36 UTC (rev 3898)
@@ -238,7 +238,7 @@
                it = toUpperData.find(ch);
                getUTF8FromUniChar(it == toUpperData.end() ? ch : it->second, 
&text);
        }
-       long len = maxlen ? (text.size() < maxlen ? text.size() : (maxlen - 1)) 
: 0;
+       long len = maxlen ? (text.size() < maxlen ? text.size() : (maxlen - 1)) 
: text.size();
        if (len) memcpy(t, text.c_str(), len);
        t[len] = 0;
  #endif



On 3/1/25 8:09 AM, Tobias Klein wrote:
Hi Troy,

can this be fixed in SWORD?
This bug impacts the search function quite significantly. I noticedwhen my standard test scenario for search started to fail after myadjustments.The reason was that the search results for my test scenariosignificantly increased and I had to adjust the expected results.The test scenario searches for "faith" in KJV. Previously (before thebugfix) I expected 324 search results.After the bugfix/change mentioned below there are now 338 searchresults. So you see that quite some verses are missed by the searchfunction because of this bug.
Best regards,
Tobias

On 2/23/25 18:38, David Haslam wrote:
Excellent sleuthing, Tobias !

Best regards,

David

Sent with Proton Mail <https://proton.me/mail/home> secure email.
On Sunday, February 23rd, 2025 at 5:17 PM, Tobias Klein<cont...@tklein.info> wrote:
Hi Troy,

I have discovered the root cause of this bug.

There is the following code in osisplain.cpp.
I suppose the uppercasing action here has negative impact on theoverall parsing when the stripText() is running?
elseif(!strncmp(token, "/divineName", 11)) {
// Get the end portion of the string, and upper case it
char*end=buf.getRawData();
end+=buf.size() -u->lastTextNode.size();
toupperstr(end);
}
When I comment this portion out, the search bug _does not occuranymore_ and I get a correct result, see below.
textBuf: For he said, Because the Lord hath sworn that the Lordwill have war with Amalek from generation to generation.
term: generation to generation
Got 11 results!
Exod 17:16
Isa 13:20
Isa 34:10
Isa 34:17
Isa 51:8
Jer 50:39
Lam 5:19
Dan 4:3
Dan 4:34
Joel 3:20
Luke 1:50
So, what the code stumbles over in the specific case of Exodus17:16 is the <divineName> tag and the parsing / actions related to it.Why is the uppercasing necessary at all in the code above?Shouldn't this be left to the application software in terms offormatting the respective element/tag in uppercase?
Best regards,
Tobias

On 2/22/25 20:32, Tobias Klein wrote:
Hi Troy,

so I did a little debugging on this.
The respective portion of code in swmodule.cpp is this code below.I added some conditional print outs for Exodus 17:16 to see whathappens there.
caseSEARCHTYPE_PHRASE: {
textBuf=stripText();
if((flags&REG_ICASE) ==REG_ICASE) textBuf.toUpper();
SWKey*currentKey=getKey();
std::stringreferenceKey="Exod 17:16";
if(currentKey->getShortText() ==referenceKey) {
std::cout<<"textBuf: "<<textBuf.c_str() <<std::endl;
std::cout<<"term: "<<term.c_str() <<std::endl;
}
// TKL: This is where the actual search per verse happens
sres=strstr(textBuf.c_str(), term.c_str());

I get the following output based on my modification above:

textBuf: For he said, Because the
term: generation to generation

The full verse content of Exodus 17:16 in KJV is this:
For he said, Because the Lord hath sworn /that/ the Lord /willhave/ war with Amalek from generation to generation.
So ... it seems that the stripText() call strips too much of thecontent (textBuf) of the verse away.Based on that there is no way for the strstr call to succeeddetecting the term "generation to generation", because at thatpoint it is not part of the search string (textBuf) anymore.
Could you do some investigation regarding the behavior ofstripText here?
Best regards,
Tobias

On 2/22/25 15:45, Tobias Klein wrote:
Hi Troy,
an Ezra Bible App user reported that the phrase search is notworking as expected.
Here is an example where the results are not as expected.

Module: KJV

Search term: "generation to generation"

I get the following results from the SWORD engine:
Isa 13:20
Isa 34:10
Isa 34:17
Isa 51:8
Jer 50:39
Dan 4:3
Dan 4:34
Joel 3:20
Luke 1:50
However, the verse Exodus 17:16 also contains this phrase, but isnot in the list of search results.
Could it be related to the way how the markup is structured?
In Exodus 17:16 [KJV], the markup of the respective phrase lookslike this:
<w class="strong:H01755">from generation</w> <wclass="strong:H01755">to generation</w>
This is how I call the search function of the SWORD engine:
listKey = module->search(searchTerm.c_str(), int(searchType),flags, scope, 0, internalModuleSearchProgressCB);seehttps://github.com/ezra-bible-app/node-sword-interface/blob/master/src/sword_backend/module_search.cpp#L178
Have a nice weekend!

Best regards,
Tobias

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Sword phrase search not returning all expected results

Reply via email to