Bård  wrote:
I had a closer look at the webhelp output and found that the search system is based on the tag-cloud generator “Snowball”, this seems to be very simple and uses “stemming” to find words that are conjugated.

It would be very nice to have a more advanced search engine for WebHelp- output from XMLmind

The search engine of the WebHelp we generate uses Snowball's stemming engine as a software component but in itself, Snowball, https://snowballstem.org/, is not a search engine.

Currently the search engine of the WebHelp we generate (see XMLmind Web Help Compiler, https://www.xmlmind.com/ditac/whc.shtml) is a *word* search engine (hence its use of stemming), not a *string* search engine, and it implements AND semantics without any special syntax to do that ("&&" not needed).

If you search "XMLmind Web Help Compiler Manual", https://www.xmlmind.com/ditac/_whc/doc/manual/index.html

- Searching for "web" gives you 11 hits.

- Searching for "web browser" (that is "web" AND any word having the same stem as "browser" contained in the same page) gives you 6 hits.

- Searching for "web browsing" gives you 1 hits. (I expected the same 6 results as above but Snowball probably does not consider "browser" and "browsing" to share the same stem; looks very much like a bug)

- Searching for "web spider" gives you 0 hits.



I tried to create an index, but that is a huge task and the way the search in the index works is very bad since you must write exactly what is in the index starting with the first level. I see that you don’t use levels at all in the some of the XMLmind documentation that has webhelp index, for example

The reason is simply our laziness. Structured index entries are supported. You must use in your source XML document the corresponding elements to specify that.

DocBook: indexterm, primary, secondary, tertiary. See https://tdg.docbook.org/tdg/5.2/indexterm.singular

DITA: You can nest <indexterm> elements to create multi-level indexes. See http://docs.oasis-open.org/dita/dita/v1.3/errata02/os/complete/part2-tech-content/langRef/base/indexterm.html#indexterm

XMLmind XML Editor even has a dialog box for that. Notice that "Term" has 3 fields corresponding to primary, secondary and tertiary.

DocBook: https://www.xmlmind.com/xmleditor/_distrib/doc/docbook/docbook_menu.html#docbook_indexterm_editor

DITA: https://www.xmlmind.com/xmleditor/_distrib/doc/dita/topic_reference.html#dita_indexterm_editor




https://www.xmlmind.com/xmleditor/_distrib/doc/configure/wh/ customize_xslt.html <https://www.xmlmind.com/xmleditor/_distrib/doc/ configure/wh/customize_xslt.html>


This chapter will not help you solve your problem.

The relevant software component here is: "XMLmind Web Help Compiler", https://www.xmlmind.com/ditac/whc.shtml





*From:*Bård The documentation we produce with XMLmind is now quite big and the users have started that it is hard to find the correct information. The built- in search engine seems to be really simple, it just searches for all words that are entered and returns all replies. The more words you search for, the more answers you get.

Sorry but I cannot reproduce this behavior. See above examples.

If you have this behavior, please report this as a bug and please send us what's needed to reproduce it here at XMLmind Software.



This is contrary to intuition; the search should narrow down when more input is given. However, if you search for a term such as “scaling”, it also returns answers for “scale” and “scaled”, which is quite advanced.

This is what you expect from the simplest *word* search engine.



I cannot find any information about the WebHelp search engine.

See "XMLmind Web Help Compiler Manual", https://www.xmlmind.com/ditac/_whc/doc/manual/index.html

However this manual does not contain much information about the search engine. Sorry for that.



Is there any support for combination of terms or REGEX patterns, for example:

word1 && word2,

AND semantics are supported. Simply type "word1 word2".


word1 || word2,

OR semantics are not supported.



“word1 word2” = “project scaling”

"word1" NEAR "word2" is not supported.




In our software, the term “Project scaling” is a feature, but it is impossible to find since the word “project” must be used 1000 times… And “scaling” returns answers for “scale” and “scaled”, which are used elsewhere.

Is there anyway to tune the search engine or replace with something better?

May be there is a way to plug an external, general purpose, search engine like DuckDuckGo, https://duckduckgo.com/, (provided your WebHelp will be published online) but for now, we never considered doing this.


--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Reply via email to