Hi,
Thanks Brooke for the support. I did read the rfc by the way but it's
for queryparser which I'm not using now. I wish I'll find time to test it!
Thanks David, I haven't read the whole zebra documention (ahah, sounds
crazy) but did some researches. I had the impression zebra didn't manage
stemming but it wasn't clear. So thank you for your answer. It will
definitely help me to understand!
Thanks Mathieu for the link. I'll have a look. For sure!
Have a good day ! :^)
François
François Charbonnier,
Bibl. prof. / Chef de produits
Tél. : (888) 604-2627
francois.charbonn...@inlibro.com <mailto:francois.charbonn...@inlibro.com>
inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
Le 2014-08-27 05:48, David Cook a écrit :
Hi Mathieu:
I think many of us think certain things happen in Zebra when they
actually happen in Koha before the query ever reaches Zebra ;).
As for stemming, theoretically the language obtained via
“C4::Templates::getlanguage($cgi, 'intranet');” should filter down
into the Snowball stemming. If it isn’t working in French, it might be
because the right locale isn’t being passed to Snowball correctly.
That’s very possible as I think we’re using arbitrary language codes
rather than standard locales in some cases. It looks like there is a
fallback to English in C4::Templates::getlanguage() as well. If it’s
not working for French, it probably just needs a tweak!
Yeah, I first heard about Snowball when reading through Zebra docs,
and I was pleasantly surprised when I saw that Lingua::Stem::Snowball
existed as a Perl interface for the C program.
David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St, Ultimo, NSW 2007
*From:*koha-devel-boun...@lists.koha-community.org
[mailto:koha-devel-boun...@lists.koha-community.org] *On Behalf Of
*Mathieu Saby
*Sent:* Wednesday, 27 August 2014 7:30 PM
*To:* koha-devel@lists.koha-community.org
*Subject:* Re: [Koha-devel] Stemming and zebra
Hi
I had always thought stemming was made by Zebra, and only in english!
In fact the algorithm for french language is here:
http://snowball.tartarus.org/algorithms/french/stemmer.html
(Lingua::Stem::Snowball is a Perl interface to the C version of the
Snowball stemmers)
Mathieu Saby
Le 27/08/2014 10:22, David Cook a écrit :
Hi Francois:
I wrote an email earlier on my tablet, but not 100% sure if it got
sent. In any case, I’m writing again now!
You’ll want to look at C4::Search::_build_stemmed_operand().
Zebra doesn’t actually do any stemming itself. If you read through
the Zebra docs (if you’re masochistic), you’ll notice that they
say explicitly that Zebra doesn’t do any stemming, but that you
can do stemming (using a stemmer like Snowball) while building a
query. That’s exactly what we do in Koha.
The Perl module that does the stemming is Lingua::Stem::Snowball.
However, you might notice that your query’s operands aren’t always
stemmed properly. I haven’t looked in a while, but I think it’s
because we don’t build our queries very well at all (when not
using QueryParser).
If you want to understand why you’re getting “skills” and
“fishxsdfe” in your results, I would suggest running some tests (
using “Data::Dumper” and “warn” ) so that you can see your query
as it is built.
I have a lot of work I want to do on C4::Search::buildQuery, but
just don’t have the time :/.
Unfortunately, at the moment, there is no stemming when using the
QueryParser. However, fortunately, using Lingua::Stem::Snowball
with QueryParser would be really really easy. I think that I’ve
written a note on how to do that somewhere on Bugzilla or maybe on
Trello…
I hope that helps! Feel free to send me an email or shout at me on
IRC if you want any clarification. I know I probably didn’t make
it any clearer but hopefully this might help you on your path to
understanding.
David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St, Ultimo, NSW 2007
*From:*koha-devel-boun...@lists.koha-community.org
<mailto:koha-devel-boun...@lists.koha-community.org>
[mailto:koha-devel-boun...@lists.koha-community.org] *On Behalf Of
*Francois Charbonnier
*Sent:* Wednesday, 27 August 2014 2:09 AM
*To:* koha-devel@lists.koha-community.org
<mailto:koha-devel@lists.koha-community.org>
*Subject:* [Koha-devel] Stemming and zebra
Hello,
I have tested the QueryStemming system preference on Koha 3.14 (my
local installation) and I'm wondering, does zebra just right
truncate the words or is there an algorithm to find the stems?
I use ICU and I have enabled "QueryWeightFields". I don't have
automatic truncation or fuzzy search on. I use these words for my
tests:
ski, skiing, skills
fish, fished, fishing, fisher, fishxsdfe
Each time, with QueryStemming on, skills and fishxsdfe come out in
the search results. Is it what I should expect? "Skills", maybe
but "fishxsdfe"?
Do you know how it works? or have a good example that would help
me to understand?
Thanks!
--
François Charbonnier,
Bibl. prof. / Chef de produits
Tél. : (888) 604-2627
francois.charbonn...@inlibro.com
<mailto:francois.charbonn...@inlibro.com>
inLibro| pour esprit libre |www.inLibro.com <http://www.inLibro.com>
_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
<mailto:Koha-devel@lists.koha-community.org>
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website :http://www.koha-community.org/
git :http://git.koha-community.org/
bugs :http://bugs.koha-community.org/
_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/
_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/