Hi,
Thanks Brooke for the support. I did read the rfc by the way but it's for queryparser which I'm not using now. I wish I'll find time to test it! Thanks David, I haven't read the whole zebra documention (ahah, sounds crazy) but did some researches. I had the impression zebra didn't manage stemming but it wasn't clear. So thank you for your answer. It will definitely help me to understand!
Thanks Mathieu for the link. I'll have a look. For sure!
Have a good day ! :^)
François

François Charbonnier,
Bibl. prof. / Chef de produits

Tél.  : (888) 604-2627
francois.charbonn...@inlibro.com <mailto:francois.charbonn...@inlibro.com>

inLibro | pour esprit libre | www.inLibro.com <http://www.inLibro.com>
Le 2014-08-27 05:48, David Cook a écrit :

Hi Mathieu:

I think many of us think certain things happen in Zebra when they actually happen in Koha before the query ever reaches Zebra ;).

As for stemming, theoretically the language obtained via “C4::Templates::getlanguage($cgi, 'intranet');” should filter down into the Snowball stemming. If it isn’t working in French, it might be because the right locale isn’t being passed to Snowball correctly. That’s very possible as I think we’re using arbitrary language codes rather than standard locales in some cases. It looks like there is a fallback to English in C4::Templates::getlanguage() as well. If it’s not working for French, it probably just needs a tweak!

Yeah, I first heard about Snowball when reading through Zebra docs, and I was pleasantly surprised when I saw that Lingua::Stem::Snowball existed as a Perl interface for the C program.

David Cook

Systems Librarian

Prosentient Systems

72/330 Wattle St, Ultimo, NSW 2007

*From:*koha-devel-boun...@lists.koha-community.org [mailto:koha-devel-boun...@lists.koha-community.org] *On Behalf Of *Mathieu Saby
*Sent:* Wednesday, 27 August 2014 7:30 PM
*To:* koha-devel@lists.koha-community.org
*Subject:* Re: [Koha-devel] Stemming and zebra

Hi

I had always thought stemming was made by Zebra, and only in english!

In fact the algorithm for french language is here:
http://snowball.tartarus.org/algorithms/french/stemmer.html

(Lingua::Stem::Snowball is a Perl interface to the C version of the Snowball stemmers)


Mathieu Saby


Le 27/08/2014 10:22, David Cook a écrit :

    Hi Francois:

    I wrote an email earlier on my tablet, but not 100% sure if it got
    sent. In any case, I’m writing again now!

    You’ll want to look at C4::Search::_build_stemmed_operand().

    Zebra doesn’t actually do any stemming itself. If you read through
    the Zebra docs (if you’re masochistic), you’ll notice that they
    say explicitly that Zebra doesn’t do any stemming, but that you
    can do stemming (using a stemmer like Snowball) while building a
    query. That’s exactly what we do in Koha.

    The Perl module that does the stemming is Lingua::Stem::Snowball.

    However, you might notice that your query’s operands aren’t always
    stemmed properly. I haven’t looked in a while, but I think it’s
    because we don’t build our queries very well at all (when not
    using QueryParser).

    If you want to understand why you’re getting “skills” and
    “fishxsdfe” in your results, I would suggest running some tests (
    using “Data::Dumper” and “warn” ) so that you can see your query
    as it is built.

    I have a lot of work I want to do on C4::Search::buildQuery, but
    just don’t have the time :/.

    Unfortunately, at the moment, there is no stemming when using the
    QueryParser. However, fortunately, using Lingua::Stem::Snowball
    with QueryParser would be really really easy. I think that I’ve
    written a note on how to do that somewhere on Bugzilla or maybe on
    Trello…

    I hope that helps! Feel free to send me an email or shout at me on
    IRC if you want any clarification. I know I probably didn’t make
    it any clearer but hopefully this might help you on your path to
    understanding.

    David Cook

    Systems Librarian

    Prosentient Systems

    72/330 Wattle St, Ultimo, NSW 2007

    *From:*koha-devel-boun...@lists.koha-community.org
    <mailto:koha-devel-boun...@lists.koha-community.org>
    [mailto:koha-devel-boun...@lists.koha-community.org] *On Behalf Of
    *Francois Charbonnier
    *Sent:* Wednesday, 27 August 2014 2:09 AM
    *To:* koha-devel@lists.koha-community.org
    <mailto:koha-devel@lists.koha-community.org>
    *Subject:* [Koha-devel] Stemming and zebra

    Hello,

    I have tested the QueryStemming system preference on Koha 3.14 (my
    local installation) and I'm wondering, does zebra just right
    truncate the words or is there an algorithm to find the stems?

    I use ICU and I have enabled "QueryWeightFields". I don't have
    automatic truncation or fuzzy search on. I use these words for my
    tests:

    &#61623ski, skiing, skills

    &#61623fish, fished, fishing, fisher, fishxsdfe

    Each time, with QueryStemming on, skills and fishxsdfe come out in
    the search results. Is it what I should expect? "Skills", maybe
    but "fishxsdfe"?

    Do you know how it works? or have a good example that would help
    me to understand?

    Thanks!

--
    François Charbonnier,
    Bibl. prof. / Chef de produits

    Tél.  : (888) 604-2627
    francois.charbonn...@inlibro.com
    <mailto:francois.charbonn...@inlibro.com>

    inLibro| pour esprit libre |www.inLibro.com <http://www.inLibro.com>




    _______________________________________________

    Koha-devel mailing list

    Koha-devel@lists.koha-community.org  
<mailto:Koha-devel@lists.koha-community.org>

    http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel

    website :http://www.koha-community.org/

    git :http://git.koha-community.org/

    bugs :http://bugs.koha-community.org/



_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Reply via email to