RE: A new Snowball stemmer

2017-10-01 Thread Uwe Schindler
A new Snowball stemmer > > Dear All, > > I'd like to integrate a new Snowball stemmer [1] to Lucene for my > experiments, but I can see some incompatibilities between original Snowball > stemmers (produced via Snowball compiler) and actual Lucene's Snowball >

A new Snowball stemmer

2017-10-01 Thread Jan Tosovsky
Dear All, I'd like to integrate a new Snowball stemmer [1] to Lucene for my experiments, but I can see some incompatibilities between original Snowball stemmers (produced via Snowball compiler) and actual Lucene's Snowball stemmers [2]. Especially: * different constructor of Among

Re: Arabic Stemmer problem

2014-09-09 Thread atawfik
means to load or charge. Since when we use a truck we actually load things, the word "شاحنه" means truck. Regards Ameer -- View this message in context: http://lucene.472066.n3.nabble.com/Arabic-Stemmer-problem-tp4157658p4157690.html Sent fro

Arabic Stemmer problem

2014-09-09 Thread Suleman Mubarik
Hi I am working on using Arabic Stemmer https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/ar/ArabicStemmer.html in suffixes there is a character THE_MARBUTA (\u0629) when this Stemmer applies stemSuffix it will remove THE_MARBUTA(ة) which will change some words for example

Re: Snowball filter - Error instantiating stemmer for a language

2014-09-05 Thread Chris Hostetter
To see about improving the error messages when users make mistakes like this... https://issues.apache.org/jira/browse/LUCENE-5926 -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache

Re: Snowball filter - Error instantiating stemmer for a language

2014-09-05 Thread Robert Muir
On Thu, Sep 4, 2014 at 7:32 PM, atawfik wrote: > As you can see, the key here is calling the *inform* method of the > SnowballProterFilterFactory to add the respected language stemmer's class. > This is actually the task of the *inform* method. The standard constructor > of this class does not ca

Re: Snowball filter - Error instantiating stemmer for a language

2014-09-04 Thread atawfik
(!args.isEmpty()) { throw new IllegalArgumentException("Unknown parameters: " + args); } } @Override public void inform(ResourceLoader loader) throws IOException { String className = "org.tartarus.snowball.ext." + language + "Stemmer"; stemClass = load

Re: Snowball filter - Error instantiating stemmer for a language

2014-09-04 Thread Chris Hostetter
? : Date: Thu, 4 Sep 2014 03:25:21 -0700 (PDT) : From: atawfik : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Snowball filter - Error instantiating stemmer for a language : : I am trying to use some filters from the snowball package. However, when I : run the

Snowball filter - Error instantiating stemmer for a language

2014-09-04 Thread atawfik
uot;); args.put("language", "Catalan"); SnowballPorterFilterFactory factory = new SnowballPorterFilterFactory(args); TokenFilter filter = factory.create(tokenStream); I got the following error: Exception in thread "main" java.lang.RuntimeException: Error in

Re: Hunspell stemmer generates multiple tokens

2013-06-07 Thread oren bochman
; token generated by the stemmer) instead of just a clause with the stem that > we expect to find in the index (if we indexed using stemming of course). > > I would like to know if you think this is the correct behaviour and if this > is something you are aware of. If I look at snowbal

Hunspell stemmer generates multiple tokens

2013-06-07 Thread Luca Cavanna
generated, containing multiple clauses (one for each token generated by the stemmer) instead of just a clause with the stem that we expect to find in the index (if we indexed using stemming of course). I would like to know if you think this is the correct behaviour and if this is something you are

Re: Russiam stemmer?

2012-12-17 Thread dokondr
s in the analyzers-common module: >> >> 1. org.apache.lucene.analysis.ru.RussianLightStemmer, used by >> RussianLightStemFilter; and >> 2. The Russian Snowball stemmer: >> org.tartarus.snowball.ext.RussianStemmer. See the code for RussianAnalyzer >> for an example of using the Sn

Re: Russiam stemmer?

2012-12-17 Thread dokondr
ere are two separate Russian stemmers in the analyzers-common module: > > 1. org.apache.lucene.analysis.ru.RussianLightStemmer, used by > RussianLightStemFilter; and > 2. The Russian Snowball stemmer: org.tartarus.snowball.ext.RussianStemmer. > See the code for RussianAnalyzer for an

Re: Russiam stemmer?

2012-12-17 Thread Steve Rowe
owball stemmer: org.tartarus.snowball.ext.RussianStemmer. See the code for RussianAnalyzer for an example of using the Snowball stemmer as part of a Lucene analysis chain: <http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_0_0/lucene/analysis/common/src/java/org/apache/lucene/an

Re: Which stemmer?

2012-11-26 Thread Dmitri Mamrukov
Sent from my iPhone On Nov 16, 2012, at 7:18 PM, "Igal @ getRailo.org" wrote: R > This message cannot be displayed because of the way it is formatted. Ask the > sender to send it again using a different format or email program. > text/plainydckcu -

Re: Which stemmer?

2012-11-21 Thread Jack Krupansky
ctions between these stemmers quite well, without highlighting the actual indexed term, which can be quite ugly. -- Jack Krupansky -Original Message- From: Elmer van Chastelet Sent: Wednesday, November 21, 2012 8:49 AM To: java-user@lucene.apache.org Subject: Re: Which stemmer? I've

Re: Which stemmer?

2012-11-21 Thread Elmer van Chastelet
I've just created a small web application which you might find useful. You can see which words are matched by a query word when using different analyzers (phonetic and stemming analyzers). These include snowball, kstem and minimal stem (the ones on the right). http://dutieq.st.ewi.tudelft.nl/w

Re: Which stemmer?

2012-11-16 Thread Lance Norskog
Lance - Original Message - | From: "Igal @ getRailo.org" | To: java-user@lucene.apache.org | Sent: Friday, November 16, 2012 4:18:20 PM | Subject: Re: Which stemmer? | | but if "dogs" are feet (and I guess I fall into the not-perfect group | here)... and "feet"

Re: Which stemmer?

2012-11-16 Thread Igal @ getRailo.org
but if "dogs" are feet (and I guess I fall into the not-perfect group here)... and "feet" is the plural form of "foot", then shouldn't "dogs" be stemmed to "dog" as a base, singular form? On 11/16/2012 2:32 PM, Tom Burton-West wrote: Hi Mike, Honestly I've never heard of anyone using "dog

Re: Which stemmer?

2012-11-16 Thread Tom Burton-West
Hi Mike, >>Honestly I've never heard of anyone using "dogs" to mean feet either, but hey nobody's perfect. This is really off topic but I couldn't resist. This usage of "dogs" to mean feet occurs in old blues lyrics such as Blind Lemon Jefferson's "Hot Dogs" http://www.youtube.com/watch?v=v670qV

Re: Which stemmer?

2012-11-15 Thread Michael Sokolov
On 11/15/2012 1:06 PM, Tom Burton-West wrote: This paper on the Kstem stemmer lists cases where the Porter stemmer understems or overstems and explains the logic of Kstem: "Viewing Morphology as an Inference Process" (*Krovetz*, R., Proceedings of the Sixteenth Annual International

Re: Which stemmer?

2012-11-15 Thread Michael Sokolov
On 11/15/2012 1:06 PM, Tom Burton-West wrote: This paper on the Kstem stemmer lists cases where the Porter stemmer understems or overstems and explains the logic of Kstem: "Viewing Morphology as an Inference Process" (*Krovetz*, R., Proceedings of the Sixteenth Annual International

Re: Which stemmer?

2012-11-15 Thread Jack Krupansky
ome use cases. -- Jack Krupansky -Original Message- From: Scott Smith Sent: Thursday, November 15, 2012 11:57 AM To: java-user@lucene.apache.org Subject: RE: Which stemmer? Thanks for the suggestions I think Erick is correct as well. I'll let the customer decide. Here'

RE: Which stemmer?

2012-11-15 Thread Scott Smith
Thanks for the suggestions I think Erick is correct as well. I'll let the customer decide. Here's an updated list. Fyi--the minStem was the English Minimal Stemmer--I changed the label. Interesting to see where the minimal stemmer and porter agree (and KStemmer doesn't). Yo

Re: Which stemmer?

2012-11-15 Thread Tom Burton-West
” would not retrieve documents containing the word “dog”. Generally there is a precision/recall tradeoff where reducing understemming increases overstemming. The problem with aggressive stemmers like the Porter stemmer, is that they overstem. The original Porter stemmer for example would stem

Re: Which stemmer?

2012-11-15 Thread Erick Erickson
: Wednesday, November 14, 2012 5:17 PM > To: java-user@lucene.apache.org > Subject: RE: Which stemmer? > > > Unfortunately, my "use case" is a customer who wants stemming, but has > very little knowledge of what that means except they think they wa

Re: Which stemmer?

2012-11-14 Thread Jack Krupansky
@lucene.apache.org Subject: RE: Which stemmer? Unfortunately, my "use case" is a customer who wants stemming, but has very little knowledge of what that means except they think they want it. I agree with your last comment. So, here's my contribution: Original porter ks

RE: Which stemmer?

2012-11-14 Thread Scott Smith
uresuresuresure surelysure surely surely fred's fred' fred's fred' rosesroseroserose Still not sure which one to pick. Porter is more aggressive. Min stemmer is pretty minimal.

Re: Which stemmer?

2012-11-14 Thread Michael Sokolov
Does anyone have any experience with the stemmers? I know that Porter is what "everyone" uses. Am I better off with KStemFilter (better performance) or ?? Does anyone understand the differences between the various stemmers and how to choose one over another? We started off using Porter, t

Re: Which stemmer?

2012-11-14 Thread Jack Krupansky
What is your use case? If you don't have a specific use case in mind, try each of them with some common words that you expect will or won't be stemmed. If you have Solr, you can experiment interactively using the Solr Admin Analysis web page. It would be nice if the javadoc for ea

Which stemmer?

2012-11-14 Thread Scott Smith
Does anyone have any experience with the stemmers? I know that Porter is what "everyone" uses. Am I better off with KStemFilter (better performance) or ?? Does anyone understand the differences between the various stemmers and how to choose one over another?

RE: Implementing Analyzer with Pling Stemmer

2010-05-23 Thread Uwe Schindler
che.org > Subject: Implementing Analyzer with Pling Stemmer > > > Hi guys! > > for the purpose of my project professor has advised me I should use the > PlingStemmer to index the terms obtained from Lucene. > http://www.mpi-inf.mpg.de/yago- > naga/javatools/doc/javatools/pa

Implementing Analyzer with Pling Stemmer

2010-05-23 Thread Xaida
had experience with PlingStemmer, I would be very grateful, all help is very very welcome! Thanx! -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-Analyzer-with-Pling-Stemmer-tp838028p838028.html Sent from the Lucene - Java Users mailing list archive at

Re: Snowball Stemmer Question

2009-12-03 Thread Otis Gospodnetic
://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From: Christopher Condit > To: "java-user@lucene.apache.org" > Sent: Thu, December 3, 2009 3:04:03 PM > Subject: Snowball Stemmer Question > > The Snowball Analyzer works well for certain constructs but

Snowball Stemmer Question

2009-12-03 Thread Christopher Condit
The Snowball Analyzer works well for certain constructs but not others. In particular I'm having a problem with things like "colossal" vs "colossus" and "hippocampus" vs "hippocampal". Is there a way to customize the analyzer to include these rules? Thanks, -Chris ---

AW: Reverse stemmer?

2009-10-09 Thread Uwe Goetzke
ing of the search results regarding to the phrase entered by the user. Regards Uwe Goetzke Healy Hudson -Ursprüngliche Nachricht- Von: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Gesendet: Donnerstag, 8. Oktober 2009 21:20 An: java-user@lucene.apache.org Betreff: Re: Reverse ste

Re: Reverse stemmer?

2009-10-08 Thread Karl Wettin
I don't want to judge it whether someone needs it or not. E.g., in the case you have multilingual documents in your index, it is straight forward to determine the language of the documents in order to choose the right stemmer. At least this is right for document with homogenous langua

Re: Reverse stemmer?

2009-10-08 Thread Jason Rutherglen
to determine the language of the documents in order to choose the > right > stemmer. At least this is right for document with homogenous language. > > Althought this is true at indexing time, the language classification for the > user query is not such trivial - and you have to do

Re: Reverse stemmer?

2009-10-08 Thread Nuno Seco
have multilingual documents in your index, it is straight forward to determine the language of the documents in order to choose the right stemmer. At least this is right for document with homogenous language. Althought this is true at indexing time, the language classification for the user query i

Re: Reverse stemmer?

2009-10-08 Thread Christian Reuschling
er to choose the right stemmer. At least this is right for document with homogenous language. Althought this is true at indexing time, the language classification for the user query is not such trivial - and you have to do this in order to stem the query terms for searching. One possibility would

Re: Reverse stemmer?

2009-10-08 Thread Dawid Weiss
ambiguity resolution. So, stemming should be perceived as a "one-way" transformation from inflected forms to some form of a unique identifier for a common lemma (a set of word forms with identical meaning). I don't know if you can call it a "reverse stemmer", but there are tool

Re: Reverse stemmer?

2009-10-06 Thread Erick Erickson
ties only now. > > I am making use of the snowball analyzer for stemming, and it works very > well. > > > Question: is there any such thing as a "reverse stemmer"? In other words, > given the stem of a word, is there any algorithm to find the original word? > Or is

Reverse stemmer?

2009-10-06 Thread David Leangen
a "reverse stemmer"? In other words, given the stem of a word, is there any algorithm to find the original word? Or is this just fantasy? ;-) Now, I understand that there is a 1:n mapping of stems:words. I can deal with tha

Re: Open source Arabic stemmer

2008-01-19 Thread Otis Gospodnetic
ll <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, January 16, 2008 8:59:55 PM Subject: Re: Open source Arabic stemmer Try searching this list for Arabic Stemmer. I seem to recall one under a GPL license. Also try Googling "arabic Lucene analyzer" -Grant O

Re: Open source Arabic stemmer

2008-01-16 Thread Grant Ingersoll
Try searching this list for Arabic Stemmer. I seem to recall one under a GPL license. Also try Googling "arabic Lucene analyzer" -Grant On Jan 16, 2008, at 1:21 PM, Liaqat Ali wrote: Hi Kindly tell me about some open source Arabic Stemmer which can be used with Lucene

Open source Arabic stemmer

2008-01-16 Thread Liaqat Ali
Hi Kindly tell me about some open source Arabic Stemmer which can be used with Lucene. Regards, Liaqat Ali

Stemmer and Synonym analyzer

2007-10-24 Thread java_user_
I am planning on building an analyzer that has stemming, stopwords and synonyms. I am planning on using the Snowball Porter stemmer and the WordNet synonym engine. Does it make sense to stem the synonym index? I do not want to stem the term “history” and then try to find the synonym. The

Re: French stemmer problem

2006-12-22 Thread Patrick Turcotte
ords do not stem to real English words with the English stemmer either. Renaud Paquay wrote: > Hello, > > Does anyone know about a modified version of the French Stemmer ? > This one has too many bad results. > For example, if I use the word : "ours" (bear) > The stemme

Re: French stemmer problem

2006-12-22 Thread Mark Miller
Non of the stemmers always stem to a valid word. It is not important as you should be stemming the query as well. The only thing that is important is that each word always stems to the same base. Many English words do not stem to real English words with the English stemmer either. Renaud

RE: French stemmer problem

2006-12-22 Thread Samir Abdou
.org Objet : French stemmer problem Hello, Does anyone know about a modified version of the French Stemmer ? This one has too many bad results. For example, if I use the word : "ours" (bear) The stemmer stemm it into "our".which doesn't exist in French. If I have some w

French stemmer problem

2006-12-22 Thread Renaud Paquay
Hello, Does anyone know about a modified version of the French Stemmer ? This one has too many bad results. For example, if I use the word : "ours" (bear) The stemmer stemm it into "our".which doesn't exist in French. If I have some words like "L'insep

Re: stemmer

2006-11-18 Thread Erick Erickson
at there *is* a built-in stemmer, but whether it does what you want when indexing multiple languages depends upon what results you expect to get...and there's no clear answer that I remember Erick On 11/18/06, Thomas Klein <[EMAIL PROTECTED]> wrote: Hi there, I'm fairly new

stemmer

2006-11-18 Thread Thomas Klein
Hi there, I'm fairly new to lucene, I just developped a multi threaded indexing tcp server using lucene to hmmm, let me remember, index stuffs :) I have to index not only english, but french and german, and, I don't know, perhaps other languages in the future. Did lucene use a defau

Re: Looking for a stemmer that can return all inflected forms

2006-10-16 Thread Steven Rowe
Hi Jong, Jong Kim wrote: > I'm looking for a stemmer that is capable of returning all morphological > variants of a query term (to be used for high-recall search). For example, > given a query term of 'cares', I would like to be able to generate 'cares', &g

RE: Looking for a stemmer that can return all inflected forms

2006-10-15 Thread Jong Kim
on is pretty straightforward. /Jong -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Sunday, October 15, 2006 12:38 AM To: java-user@lucene.apache.org Subject: Re: Looking for a stemmer that can return all inflected forms Bill: Lucene already comes

Re: Looking for a stemmer that can return all inflected forms

2006-10-14 Thread Otis Gospodnetic
age From: Bill Taylor <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Cc: Jong Kim <[EMAIL PROTECTED]> Sent: Saturday, October 14, 2006 11:43:10 PM Subject: Re: Looking for a stemmer that can return all inflected forms On Oct 14, 2006, at 3:57 PM, Jong Kim wrote: > Hi, > > I'm

Re: Looking for a stemmer that can return all inflected forms

2006-10-14 Thread Bill Taylor
On Oct 14, 2006, at 3:57 PM, Jong Kim wrote: Hi, I'm looking for a stemmer that is capable of returning all morphological variants of a query term (to be used for high-recall search). For example, given a query term of 'cares', I would like to be able to generate 'car

Re: Looking for a stemmer that can return all inflected forms

2006-10-14 Thread Mike Klaas
On 10/14/06, Jong Kim <[EMAIL PROTECTED]> wrote: Hi, I'm looking for a stemmer that is capable of returning all morphological variants of a query term (to be used for high-recall search). For example, given a query term of 'cares', I would like to be able to generate &#

Looking for a stemmer that can return all inflected forms

2006-10-14 Thread Jong Kim
Hi, I'm looking for a stemmer that is capable of returning all morphological variants of a query term (to be used for high-recall search). For example, given a query term of 'cares', I would like to be able to generate 'cares', 'care', 'cared', a

Re: Stemmer Implementation Strategy - feedback?

2006-08-08 Thread eks dev
I would suggest you to have a look at Egothor stemmer (http://www.egothor.org/book/bk01ch01s06.html), can be trained rather easily (if your only use of "roots" is for searching) I have only heard of it as a good thing, never tried it On Aug 4, 2006, at 1:29 PM, Marios Skoun

Re: Stemmer Implementation Strategy - feedback?

2006-08-07 Thread Marvin Humphrey
On Aug 7, 2006, at 11:23 PM, Marios Skounakis wrote: I directed the question to the lucene list in order to find out what people think about the general case Martin Porter touches on some of the pros and cons of a dictionary- based approach to stemming at

Re: Stemmer Implementation Strategy - feedback?

2006-08-07 Thread Marios Skounakis
the question to the lucene list in order to find out what people think about the general case - is the stemmer class allowed to use, say, 1 MB of memory? Is your lexicon approach going to be complete? I don't know Greek, so I don't know if you have a fixed set of roots. Also, I don'

Re: Stemmer Implementation Strategy - feedback?

2006-08-07 Thread Grant Ingersoll
your alternative :-) (i.e. doing it by hand.) Writing the stemmer seems pretty easy, so I would go for it and then test it to see if it meets your needs and, then, if you can, share it with others here. -Grant On Aug 4, 2006, at 1:29 PM, Marios Skounakis wrote: Hi all, The contrib

Stemmer Implementation Strategy - feedback?

2006-08-04 Thread Marios Skounakis
Hi all, The contrib section of Lucene contains a Greek Analyzer, which however only does some letter normalization (capitals to lowercase, accent removal) and basic stop word removal. I am interested in creating a Stemmer for the Greek Language to use with Lucene (i.e. implement it as an

Re: Stemmer algorithms

2006-02-13 Thread jason
Hi, I have test some stemmer algorithms in my application. However, i think we'd better writer a weaker algorithm. I mean, the Porter and some other algorithms are too strong. maybe an algorithm which can convert plural to single noun is enough. On 2/14/06, Yilmazel, Sibel <[EMAIL P

Re: Stemmer algorithms

2006-02-13 Thread Otis Gospodnetic
e for download, so you should be able to try both K-stem and Porter and compare. Otis - Original Message From: "Yilmazel, Sibel" <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Mon 13 Feb 2006 01:41:52 PM EST Subject: Stemmer algorithms Hello all, We have

Stemmer algorithms

2006-02-13 Thread Yilmazel, Sibel
tem seemed to be a weak stemming algorithm as it strips off only the inflectional suffixes (-s, -es, -ed). In IR, it is usually recommended using a "weak" stemmer, as the "weak" stemmer seldom hurts performance, but it usually provides significant improvement with precision.

Re: Public access to the stemmer (germanstemmer in my case)

2005-08-13 Thread Otis Gospodnetic
ge with XML-RPC. What I did now is I copied the GermanStemmer > > from lucene into my package and called it from there. > > But I'm not sure if that's a clever idea and maybe I just overlooked > a > public interface to the stemmer output? Or I'm approaching the wh

Public access to the stemmer (germanstemmer in my case)

2005-08-13 Thread Markus Fischer
I'm not aware of the stemmed word. My frontend application is not Java, I'm only accessing Lucene through my package with XML-RPC. What I did now is I copied the GermanStemmer from lucene into my package and called it from there. But I'm not sure if that's a clev

Re: URL Stemmer

2005-07-27 Thread Otis Gospodnetic
Hm, not sure why you're emailing [EMAIL PROTECTED] [EMAIL PROTECTED] may be better. Here are 2 ancient classes from 2003 that I once used to normalize URLs, to help me identify URL duplicates. This may get stripped on its way to the list. Otis --- Chris Fraschetti <[EMAIL PROTECTED]> wrote:

URL Stemmer

2005-07-27 Thread Chris Fraschetti
Writing simple code to trim down a URL is trivial, but to actually trim it down to its most meaningful state is very hard. In same cases the URL parameters actually define the page in others they are useless babble. I'd like to use the hash of a page's URL as well as a hash of the content data to h

Re: snowball analyzer uismo issue in spanish stemmer

2005-03-16 Thread Erik Hatcher
The Snowball stemmers are generated from the definitions pulled automatically from the Snowball projects CVS server. I just tried regenerating, which automatically pulls from CVS, and got this error: compile-compiler: [apply] /Users/erik/dev/lucene/java/contrib/snowball/snowball/website

snowball analyzer uismo issue in spanish stemmer

2005-03-16 Thread Ernesto De Santis
Hi I found a problem with the SpanishStemmer in SnowballAnalyzer. The words finished in "ismo" are striped fine, but words finished in "guismo" not. in Spanish: "america" and "americanismo" are fine "argentina" and "argentinismo" are fine "amigo" is fine but "amigismo" not is fine the right word