Hi
I had always thought stemming was made by Zebra, and only in english!
In fact the algorithm for french language is here:
http://snowball.tartarus.org/algorithms/french/stemmer.html
(Lingua::Stem::Snowball is a Perl interface to the C version of the
Snowball stemmers)
Mathieu Saby
Le 27/08/2014 10:22, David Cook a écrit :
Hi Francois:
I wrote an email earlier on my tablet, but not 100% sure if it got
sent. In any case, I'm writing again now!
You'll want to look at C4::Search::_build_stemmed_operand().
Zebra doesn't actually do any stemming itself. If you read through the
Zebra docs (if you're masochistic), you'll notice that they say
explicitly that Zebra doesn't do any stemming, but that you can do
stemming (using a stemmer like Snowball) while building a query.
That's exactly what we do in Koha.
The Perl module that does the stemming is Lingua::Stem::Snowball.
However, you might notice that your query's operands aren't always
stemmed properly. I haven't looked in a while, but I think it's
because we don't build our queries very well at all (when not using
QueryParser).
If you want to understand why you're getting "skills" and "fishxsdfe"
in your results, I would suggest running some tests ( using
"Data::Dumper" and "warn" ) so that you can see your query as it is built.
I have a lot of work I want to do on C4::Search::buildQuery, but just
don't have the time :/.
Unfortunately, at the moment, there is no stemming when using the
QueryParser. However, fortunately, using Lingua::Stem::Snowball with
QueryParser would be really really easy. I think that I've written a
note on how to do that somewhere on Bugzilla or maybe on Trello...
I hope that helps! Feel free to send me an email or shout at me on IRC
if you want any clarification. I know I probably didn't make it any
clearer but hopefully this might help you on your path to understanding.
David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St, Ultimo, NSW 2007
*From:*koha-devel-boun...@lists.koha-community.org
[mailto:koha-devel-boun...@lists.koha-community.org] *On Behalf Of
*Francois Charbonnier
*Sent:* Wednesday, 27 August 2014 2:09 AM
*To:* koha-devel@lists.koha-community.org
*Subject:* [Koha-devel] Stemming and zebra
Hello,
I have tested the QueryStemming system preference on Koha 3.14 (my
local installation) and I'm wondering, does zebra just right truncate
the words or is there an algorithm to find the stems?
I use ICU and I have enabled "QueryWeightFields". I don't have
automatic truncation or fuzzy search on. I use these words for my tests:
* ski, skiing, skills
* fish, fished, fishing, fisher, fishxsdfe
Each time, with QueryStemming on, skills and fishxsdfe come out in the
search results. Is it what I should expect? "Skills", maybe but
"fishxsdfe"?
Do you know how it works? or have a good example that would help me to
understand?
Thanks!
--
François Charbonnier,
Bibl. prof. / Chef de produits
Tél. : (888) 604-2627
francois.charbonn...@inlibro.com
<mailto:francois.charbonn...@inlibro.com>
inLibro| pour esprit libre |www.inLibro.com <http://www.inLibro.com>
_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/