Bård wrote:
I had a closer look at the webhelp output and found that the search
system is based on the tag-cloud generator “Snowball”, this seems to be
very simple and uses “stemming” to find words that are conjugated.
It would be very nice to have a more advanced search engine for WebHelp-
output from XMLmind
The search engine of the WebHelp we generate uses Snowball's stemming
engine as a software component but in itself, Snowball,
https://snowballstem.org/, is not a search engine.
Currently the search engine of the WebHelp we generate (see XMLmind Web
Help Compiler, https://www.xmlmind.com/ditac/whc.shtml) is a *word*
search engine (hence its use of stemming), not a *string* search engine,
and it implements AND semantics without any special syntax to do that
("&&" not needed).
If you search "XMLmind Web Help Compiler Manual",
https://www.xmlmind.com/ditac/_whc/doc/manual/index.html
- Searching for "web" gives you 11 hits.
- Searching for "web browser" (that is "web" AND any word having the
same stem as "browser" contained in the same page) gives you 6 hits.
- Searching for "web browsing" gives you 1 hits. (I expected the same 6
results as above but Snowball probably does not consider "browser" and
"browsing" to share the same stem; looks very much like a bug)
- Searching for "web spider" gives you 0 hits.
I tried to create an index, but that is a huge task and the way the
search in the index works is very bad since you must write exactly what
is in the index starting with the first level. I see that you don’t use
levels at all in the some of the XMLmind documentation that has webhelp
index, for example
The reason is simply our laziness. Structured index entries are
supported. You must use in your source XML document the corresponding
elements to specify that.
DocBook: indexterm, primary, secondary, tertiary. See
https://tdg.docbook.org/tdg/5.2/indexterm.singular
DITA: You can nest <indexterm> elements to create multi-level indexes.
See
http://docs.oasis-open.org/dita/dita/v1.3/errata02/os/complete/part2-tech-content/langRef/base/indexterm.html#indexterm
XMLmind XML Editor even has a dialog box for that. Notice that "Term"
has 3 fields corresponding to primary, secondary and tertiary.
DocBook:
https://www.xmlmind.com/xmleditor/_distrib/doc/docbook/docbook_menu.html#docbook_indexterm_editor
DITA:
https://www.xmlmind.com/xmleditor/_distrib/doc/dita/topic_reference.html#dita_indexterm_editor
https://www.xmlmind.com/xmleditor/_distrib/doc/configure/wh/
customize_xslt.html <https://www.xmlmind.com/xmleditor/_distrib/doc/
configure/wh/customize_xslt.html>
This chapter will not help you solve your problem.
The relevant software component here is: "XMLmind Web Help Compiler",
https://www.xmlmind.com/ditac/whc.shtml
*From:*Bård
The documentation we produce with XMLmind is now quite big and the users
have started that it is hard to find the correct information. The built-
in search engine seems to be really simple, it just searches for all
words that are entered and returns all replies. The more words you
search for, the more answers you get.
Sorry but I cannot reproduce this behavior. See above examples.
If you have this behavior, please report this as a bug and please send
us what's needed to reproduce it here at XMLmind Software.
This is contrary to intuition; the
search should narrow down when more input is given. However, if you
search for a term such as “scaling”, it also returns answers for “scale”
and “scaled”, which is quite advanced.
This is what you expect from the simplest *word* search engine.
I cannot find any information about the WebHelp search engine.
See "XMLmind Web Help Compiler Manual",
https://www.xmlmind.com/ditac/_whc/doc/manual/index.html
However this manual does not contain much information about the search
engine. Sorry for that.
Is there any support for combination of terms or REGEX patterns, for
example:
word1 && word2,
AND semantics are supported. Simply type "word1 word2".
word1 || word2,
OR semantics are not supported.
“word1 word2” = “project scaling”
"word1" NEAR "word2" is not supported.
In our software, the term “Project scaling” is a feature, but it is
impossible to find since the word “project” must be used 1000 times… And
“scaling” returns answers for “scale” and “scaled”, which are used
elsewhere.
Is there anyway to tune the search engine or replace with something better?
May be there is a way to plug an external, general purpose, search
engine like DuckDuckGo, https://duckduckgo.com/, (provided your WebHelp
will be published online) but for now, we never considered doing this.
--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support