Hello,
First, apologies for the weird subject line, and apologies for cross-posting,
but last week it got no replies on the Solr user mailing list.
We index many languages and search over all those languages at once, but boost
the language of the user's preference. To differentiate between ste
On Tue, Oct 24, 2017 at 11:04 AM, julien Blaize wrote:
> Hello,
>
> i am lookingfor a way to efficiently do the reverse of stemming.
> Example : if i give to the program the verb "drug" it will give me
> "drugged', "drugging", "drugs", &qu
Hello,
i am lookingfor a way to efficiently do the reverse of stemming.
Example : if i give to the program the verb "drug" it will give me
"drugged', "drugging", "drugs", "drugstore" etc...
I have used the program wordforms from hunspell to gen
,
--Xiaolong
On Fri, Feb 3, 2017 at 1:16 PM, Xiaolong Zheng
wrote:
> Hello,
>
> I am trying collect stemming changes in my search index during the
> indexing time. So I could collect a list of stemmed word -> [variety
> original word] (e.g: plot -> [plots, plotting, plotted])
Hello,
I am trying collect stemming changes in my search index during the indexing
time. So I could collect a list of stemmed word -> [variety original word]
(e.g: plot -> [plots, plotting, plotted]) for a later use.
I am using k-stem filter + KeywordRepeatFilter
+ RemoveDuplicatesTokenFil
Hello.
I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for
the Luke similarity calculation. Luke by default use the DefaultSimilarity.
Can anyone help with this? I use Lucene 4.10.4 and Luke for that version
of Lucene index.
Dwaipayan
Stemming is an inherently limited process. It doesn't know about the
word 'news', it just has a rule about 's'.
Some of us sell commercial products that do more complex linguistic
processing that knows about which words are which.
There may be open source implementati
Hi Dwaipayan,
Another way is to use KeywordMarkerFilter. Stemmer implementations respect this
attribute.
If you want to supply your own mappings, StemmerOverrideTokenFilter could be
used as well.
ahmet
On Monday, March 14, 2016 4:31 PM, Dwaipayan Roy
wrote:
I am using EnglishAnalyzer wi
.org
> Subject: Problem with porter stemming
>
> I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses
> the porter stemmer (snowball) to stem the words. But using the
> EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is
>
I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses
the porter stemmer (snowball) to stem the words. But using the
EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is
getting stemmed into 'new'.
Any help would be appreciated.
AM, Modassar Ather
> wrote:
> > Can
> > anyone tell me why this option is not provided for Stemming.
> >
> > I am not sure about it but the original token can be preserved by
> > using too.
> > To avoid any duplicate token in the document > class=&quo
FilterFactory,
or use a copyField that doesn't stem and when exact matches are
required, search on that field.
Best,
Erick
On Tue, Aug 25, 2015 at 5:05 AM, Modassar Ather wrote:
> Can
> anyone tell me why this option is not provided for Stemming.
>
> I am not sure about it but
Can
anyone tell me why this option is not provided for Stemming.
I am not sure about it but the original token can be preserved by using
too.
To avoid any duplicate token in the document can be used at the end of
analysis chain.
Hope this helps.
Regards,
Modassar
On Tue, Aug 25, 2015 at 2:12
anyone tell me why this option is not provided for Stemming. For e.g. if I
want to store both *Methods* and *Method* in my index then I think there is
no option is available in Lucene to do this. I also noticed that if we
place EnglishMinimalStemFilterFactory after WordDelimiterFilterFactory with
Ah, yes, that does it. Thank you both.
Rob
On Jul 29, 2014, at 10:30 AM, Alexandre Patry
wrote:
>
> On 29/07/2014 10:28, Rob Nikander wrote:
>> Mmm. I don’t see a way to construct one, except passing an FST, which isn’t
>> exactly a map. I look at the FST javadoc; it’s a rabbit hole.
> You
On 29/07/2014 10:28, Rob Nikander wrote:
Mmm. I don’t see a way to construct one, except passing an FST, which isn’t
exactly a map. I look at the FST javadoc; it’s a rabbit hole.
You probably want to look at
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscel
Mmm. I don’t see a way to construct one, except passing an FST, which isn’t
exactly a map. I look at the FST javadoc; it’s a rabbit hole.
Rob
On Jul 29, 2014, at 10:14 AM, Robert Muir wrote:
> You can put this thing before your stemmer, with a custom map of exceptions:
>
> http://lucene.apach
You can put this thing before your stemmer, with a custom map of exceptions:
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/StemmerOverrideFilter.html
On Tue, Jul 29, 2014 at 10:03 AM, Robert Nikander
wrote:
> Hi,
>
> I created an Analyzer with a Po
Hi,
I created an Analyzer with a PorterStemFilter, and I’m searching some test
documents. Normal plurals work; searching for “zebra” finds text with
“zebras”. But searching for “goose” doesn’t find “geese”. Is that expected?
Does it give up on irregular English? Is there a way to make that
Done. https://issues.apache.org/jira/browse/LUCENE-5749
On 2014/06/10, 1:18 AM, Jack Krupansky wrote:
Please do file a Jira. I'm sure the discussion will be interesting.
-- Jack Krupansky
-
To unsubscribe, e-mail: java-user-
Please do file a Jira. I'm sure the discussion will be interesting.
-- Jack Krupansky
-Original Message-
From: Jamie
Sent: Monday, June 9, 2014 9:33 AM
To: java-user@lucene.apache.org
Subject: Re: searching with stemming
Jack
Thanks. I figured as much.
I'm modifying eac
to file a Jira suggesting your suggested improvement.
-- Jack Krupansky
-Original Message- From: Jamie
Sent: Monday, June 9, 2014 6:56 AM
To: java-user@lucene.apache.org
Subject: Re: searching with stemming
To me, it seems strange that these default analyzers, don't provide
construc
mprovement.
-- Jack Krupansky
-Original Message-
From: Jamie
Sent: Monday, June 9, 2014 6:56 AM
To: java-user@lucene.apache.org
Subject: Re: searching with stemming
To me, it seems strange that these default analyzers, don't provide
constructors that enable one to override stemming, e
Benson. Thanks. I was just hoping to avoid a whole bunch of boilerplate.
On 2014/06/09, 1:07 PM, Benson Margulies wrote:
Analyzer classes are optional; an analyzer is just a factory for a set of
token stream components. you can usually do just fine with an anonymous
class. Or in your case, the o
t;> -anonymous analyzer at all.
>> On Jun 9, 2014 6:55 AM, "Jamie" wrote:
>>
>> To me, it seems strange that these default analyzers, don't provide
>>> constructors that enable one to override stemming, etc?
>>>
>>> On 2014/0
ms strange that these default analyzers, don't provide
constructors that enable one to override stemming, etc?
On 2014/06/09, 12:39 PM, Trejkaz wrote:
On Mon, Jun 9, 2014 at 7:57 PM, Jamie wrote:
Greetings
Our app currently uses language specific analysers (e.g. EnglishAnalyzer,
GermanAnal
Are you using Solr? If so you are on the wrong mailing list. If not, why do
you need a non-
-anonymous analyzer at all.
On Jun 9, 2014 6:55 AM, "Jamie" wrote:
> To me, it seems strange that these default analyzers, don't provide
> constructors that enable one to override
To me, it seems strange that these default analyzers, don't provide
constructors that enable one to override stemming, etc?
On 2014/06/09, 12:39 PM, Trejkaz wrote:
On Mon, Jun 9, 2014 at 7:57 PM, Jamie wrote:
Greetings
Our app currently uses language specific analysers (e.g. EnglishAna
On Mon, Jun 9, 2014 at 7:57 PM, Jamie wrote:
> Greetings
>
> Our app currently uses language specific analysers (e.g. EnglishAnalyzer,
> GermanAnalyzer, etc.). We need an option to disable stemming. What's the
> recommended way to do this? These analyzers do not include an
tokenizer and filter(s) that
you need, and don't include stemming.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
You should construct an analysis chain that does what you need. Read the
source of the relevant analyzer and pick the tokenizer and filter(s) that
you need, and don't include stemming.
On Mon, Jun 9, 2014 at 5:57 AM, Jamie wrote:
> Greetings
>
> Our app currently uses lan
Greetings
Our app currently uses language specific analysers (e.g.
EnglishAnalyzer, GermanAnalyzer, etc.). We need an option to disable
stemming. What's the recommended way to do this? These analyzers do not
include an option to disable stemming, only a parameter to specify a
list word
gt; A simple flag which allows or suppresses the stemming would solve everyones
> problem.
Stemming isn't done by default, so... problem already solved then?
TX
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apa
I've encountered the same problem and tried to use your workaround. But
overwriting the parser hasn't done the job.
I do not understand why the stemming is done anyway.
Uwe wrote
> This is a well-known problem: Wildcards cannot be analyzed by the query
> parser, because th
r (in particular
> org.apache.lucene.analysis.ru.RussianAnalyzer) for standalone stemming.
> Can't find such an example here:
> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/package-summary.html?is-external=true#package_description
>
> Thanks!
> Hello,
> I am looking for an example of using Tokenizer + Analyzer (in particular
> org.apache.lucene.analysis.ru.RussianAnalyzer) for standalone stemming.
> Can't find such an example here:
>
> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/package-s
A possible workaround could be to modify search terms with wildcard tokens by
stemming them manually and creating a new search string.
Searches for hersen* would be modified to hers* and return what you expect.
Con is of course that you search for more than you specified.
Lars-Erik
This is a well-known problem: Wildcards cannot be analyzed by the query parser,
because the analysis would destroy the wildcard characters; also stemming of
parts of terms will never work. For Solr there is a workaround (MultiTermAware
component), but it is also very limited and only works when
Hello there,
my colleague and I ran into an example which didn't return the result size
which we were expecting. We discovered that there is a mismatch in handling
terms while indexing and searching. This issue is already discussed several
times in the internet as we found out later on, but in o
Krupansky
-Original Message-
From: Paul Hill
Sent: Tuesday, June 12, 2012 7:43 PM
To: java-user@lucene.apache.org
Subject: RE: Stemming - limited index expansion
Thanks for the reply.
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Tuesday, June 1
Thanks for the reply.
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Tuesday, June 12, 2012 1:14 PM
> To: java-user@lucene.apache.org
> Subject: Re: Stemming - limited index expansion
>
> I don't completely follow precisel
Jack Krupansky
-Original Message-
From: Paul Hill
Sent: Tuesday, June 12, 2012 3:07 PM
To: java-user@lucene.apache.org
Subject: Stemming - limited index expansion
As others have previously proposed on this list, I am interesting in
inserting a second token at some positions in my ind
As others have previously proposed on this list, I am interesting in inserting
a second token at some positions in my index. I'll call this Limited Index
Expansion.
I want to retain the original token, so that I can score an original word that
matches in a text better than just any synonym/stem
r.getFieldQuery(field, queryText, quoted);
> }
>
> if you are using perfieldanalyzerwrapper so that these "_exact" fields
> don't use stemming, then it should just work.
> you'll need branch_3.x or trunk for this, but looks like 3.1 is close
Using the 3.1 jars fr
queryText, quoted);
}
if you are using perfieldanalyzerwrapper so that these "_exact" fields
don't use stemming, then it should just work.
you'll need branch_3.x or trunk for this, but looks like 3.1 is close
-
To unsu
I have Lucene indexes build using a shingled, stemmed custom analyzer.
I have a new requirement that exact searches match correctly.
ie: bar AND "nachos"
will only fetch results with plural nachos. Right now, with the
stemming, singular nacho results are returned as well. I realize that
Another approach to stemming at index time but still providing exact matches
when requested is to index the stemmed version AND the original version at
the same position (think synonyms). But here's the trick, index the original
token with a special character. For instance, indexing &qu
Thanks, everyone!
--- On Thu, 5/20/10, Herbert Roitblat wrote:
> From: Herbert Roitblat
> Subject: Re: Stemming and Wildcard Queries
> To: java-user@lucene.apache.org
> Date: Thursday, May 20, 2010, 4:48 PM
> At a general level, we have found
> that stemming during indexin
At a general level, we have found that stemming during indexing is not
advisable. Sometimes users want the exact form and if you have removed the
exact form during indexing, obviously, you cannot provide that. Rather, we
have found that stemming during search is more useful, or maybe it
> Is there a good way to combine the
> wildcard queries and stemming?
>
> As is, the field which is stemmed at index time, won't work
> with some wildcard queries.
org.apache.lucene.queryParser.analyzing.Analyzing
Is there a good way to combine the wildcard queries and stemming?
As is, the field which is stemmed at index time, won't work with some wildcard
queries.
We were thinking to create two separate index fields - one stemmed, one
non-stemmed, but we are having issues with our SpanNear qu
; HTH
> Erick
>
> On Tue, May 18, 2010 at 2:05 PM, Larry Hendrix wrote:
>
>> Hi,
>>
>> Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having
>> problems with stemming. Does anyone have a recommendation for other text
&
now I'm using Lucene with a basic Whitespace Anayzer but I'm having
> problems with stemming. Does anyone have a recommendation for other text
> analyzers that handle stemming and also keep capitalization, stop words, and
> punctuation?
>
> Thanks,
> Larry
>
>
Hi Larry-
> Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having
> problems with stemming. Does anyone have a recommendation for other
> text analyzers that handle stemming and also keep capitalization, stop words,
> and punctuation?
Have you tried t
Hi,
Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having
problems with stemming. Does anyone have a recommendation for other text
analyzers that handle stemming and also keep capitalization, stop words, and
punctuation?
Thanks,
Larry
Larry A. Hendrix, Gradua
ere
>
> We are having problems with some of the Lucene analyzers in the
> contributions package. For instance, it appears that the Russian analyzer
> supports stemming, although, when we test it it does not. Is there a
> specific switch that we must enable to enable the stemming of word
Hi There
We are having problems with some of the Lucene analyzers in the
contributions package. For instance, it appears that the Russian
analyzer supports stemming, although, when we test it it does not. Is
there a specific switch that we must enable to enable the stemming of
words? When we
Thank you very much Yonik. I downloaded the latest Solr build, pulled the
WordDelimiterFilter and used it with the same option as used by Solr default
and it worked like a charm. Thanks to Robert also.
Thanks,
KK
On Tue, Jun 9, 2009 at 7:01 PM, Yonik Seeley wrote:
> I just cut'n'pasted your wor
Note: I request Solr users to go through this mail and let me thier ideas.
Thanks Yonik, you rightly pointed it out. That clearly says that the way I'm
trying to mimic the default behaviour of Solr indexing/searching in Lucene
is wrong, right?.
I downloaded the latest version of solr nightly on m
I just cut'n'pasted your word into Solr... it worked fine (it didn't
split the word).
Make sure you're using the latest from the trunk version of Solr...
this was fixed since 1.3
http://localhost:8983/solr/select?q=साल&debugQuery=true
[...]
साल
साल
text:साल
text:साल
-Yonik
On Tue, Jun
7890cd73b193cefed83c283339089
> >> >> >> and say that there also its specified that we've to mention the
> >> >> parameters
> >> >> >> and both are different for indexing and querying.
> >> >> >> I'm kind of stuc
;> >> >> worddelimiterfilterfactory, right? I don't know whats the use of the
>> >> other
>> >> >> one. Anyway can you guide me getting rid of the above error. And yes
>> >> I'll
>> >> >> change the order of applying
erfactory, right? I don't know whats the use of the
> >> other
> >> >> one. Anyway can you guide me getting rid of the above error. And yes
> >> I'll
> >> >> change the order of applying the filters as you said.
> >> >>
> >
;>
>> >>
>> >>
>> >> On Fri, Jun 5, 2009 at 5:48 PM, Robert Muir wrote:
>> >>
>> >> > KK, you got the right idea.
>> >> >
>> >> > though I think you might want to change the order, m
ght not work
> correctly.
> >> >
> >> > On Fri, Jun 5, 2009 at 8:05 AM, KK
> wrote:
> >> >
> >> > > Thanks Robert. This is exactly what I did and its working but
> delimiter
> >> > is
> >> > > missing I'm going to
nge the order, move the stopfilter
> >> > before the porter stem filter... otherwise it might not work
> correctly.
> >> >
> >> > On Fri, Jun 5, 2009 at 8:05 AM, KK
> wrote:
> >> >
> >> > > Thanks Robert. This is exactly what I d
009 at 8:05 AM, KK wrote:
>> >
>> > > Thanks Robert. This is exactly what I did and its working but delimiter
>> > is
>> > > missing I'm going to add that from solr-nightly.jar
>> > >
>> > > /**
>> > > * Analyzer for
y.jar
> > >
> > > /**
> > > * Analyzer for Indian language.
> > > */
> > > public class IndicAnalyzer extends Analyzer {
> > > public TokenStream tokenStream(String fieldName, Reader reader) {
> > > TokenStream ts = new Whitespac
izer(reader);
> >ts = new PorterStemFilter(ts);
> >ts = new LowerCaseFilter(ts);
> >ts = new StopFilter(ts, StopAnalyzer.ENGLISH_STOP_WORDS);
> >return ts;
> > }
> > }
> >
> > Its able to do stemming/case-folding and supports search for bot
Tokenizer(reader);
>ts = new PorterStemFilter(ts);
>ts = new LowerCaseFilter(ts);
>ts = new StopFilter(ts, StopAnalyzer.ENGLISH_STOP_WORDS);
>return ts;
> }
> }
>
> Its able to do stemming/case-folding and supports search for both english
> and indic texts. let me try o
eader) {
TokenStream ts = new WhitespaceTokenizer(reader);
ts = new PorterStemFilter(ts);
ts = new LowerCaseFilter(ts);
ts = new StopFilter(ts, StopAnalyzer.ENGLISH_STOP_WORDS);
return ts;
}
}
Its able to do stemming/case-folding and supports search for both english
and indic texts. let me tr
to me Solr:Lucene is similar to
> > > > > Window$:Linux, its my view only, though]. Coming back to the point
> as
> > > Uwe
> > > > > mentioned that we can do the same thing in lucene as well, what is
> > > > > available
> > > > >
> Uwe
> > > > mentioned that we can do the same thing in lucene as well, what is
> > > > available
> > > > in Solr, Solr is based on Lucene only, right?
> > > > I request Uwe to give me some more ideas on using the analyzers from
> > solr
> &
first via
whitespaceanalyzer, then via analyzer for lowercasing, then analyzer for
stemming[do we have such analyzers or are they filters?I think they are
filters, then I've to go for custom analyzer]. As you said whitespace one
will act on the full content, lowercasing will aplly only on t
ng a mix of both english and
> > non-english
> > > content.
> > > Muir, can you give me a bit detail description of how to use the
> > > WordDelimiteFilter to do my job.
> > > On a side note, I was thingking of writing a simple analyzer that will
> d
t.
> > Muir, can you give me a bit detail description of how to use the
> > WordDelimiteFilter to do my job.
> > On a side note, I was thingking of writing a simple analyzer that will do
> > the following,
> > #. If the webpage fragment is non-english[for me its some i
andard Lucene analyzers. So you
> can drop the solr core jar into your project and just use them :-)
>
> Currently I am not sure which one is the analyzer Robert means, that can do
> english stemming and detecting non-english parts, but there is to look for
> it.
>
> Uwe
h and non-english
> content.
> Muir, can you give me a bit detail description of how to use the
> WordDelimiteFilter to do my job.
> On a side note, I was thingking of writing a simple analyzer that will do
> the following,
> #. If the webpage fragment is non-english[for me its some india
che.org/solr/api/org/apache/solr/analysis/package-summary.h%0Atml>
>
> As you see, the Solr analyzers are just standard Lucene analyzers. So you
> can drop the solr core jar into your project and just use them :-)
>
> Currently I am not sure which one is the analyzer Robert mean
yzers are just standard Lucene analyzers. So you
can drop the solr core jar into your project and just use them :-)
Currently I am not sure which one is the analyzer Robert means, that can do
english stemming and detecting non-english parts, but there is to look for
its some indian language]
then index them as such, no stemming/ stop word removal to begin with. As I
know its in UCN unicode something like \u0021\u0012\u34ae\u0031[just a
sample]
# If the fragment is english then apply standard anlyzing process for
english content. I've not thought of quering i
hetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Robert Muir [mailto:rcm...@gmail.com]
> > Sent: Thursday, June 04, 2009 1:18 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: How to support stemming and case folding for en
to:rcm...@gmail.com]
> Sent: Thursday, June 04, 2009 1:18 PM
> To: java-user@lucene.apache.org
> Subject: Re: How to support stemming and case folding for english content
> mixed with non-english content?
>
> KK, ok, so you only really want to stem the english. This is good.
>
nt to use the basic white space analyzer as we dont have stemmers
> for this as I mentioned earlier and whereever english appears I want them
> to
> be stemmed tokenized etc[the standard process used for english content]. As
> of now I'm using whitespace analyzer for the full content
content]. As
of now I'm using whitespace analyzer for the full content which doesnot
support case folding, stemming etc for teh content. So if there is an
english word say "Detection" indexed as such then searching for detection or
detect is not giving any results, which is the expecte
content. I must metion that we dont have stemming and case
> folding for these non-english content. I'm stuck with this. Some one do let
> me know how to proceed for fixing this issue.
>
> Thanks,
> KK.
>
--
Robert Muir
rcm...@gmail.com
nt intermingled with
non-english content. I must metion that we dont have stemming and case
folding for these non-english content. I'm stuck with this. Some one do let
me know how to proceed for fixing this issue.
Thanks,
KK.
Thank you Ian.
Kamal
Original Message:
Yep, I reckon so.
btw a Google search for something like lucene stemming gets hits,
including a couple of articles about stemming. Might be worth a look.
--
Ian.
On Mon, May 11, 2009 at 2:08 PM, Kamal Najib wrote:
> will the anlyzer now do stemming,
Yep, I reckon so.
btw a Google search for something like lucene stemming gets hits,
including a couple of articles about stemming. Might be worth a look.
--
Ian.
On Mon, May 11, 2009 at 2:08 PM, Kamal Najib wrote:
> will the anlyzer now do stemming, if i do the folow:
> analyzer
will the anlyzer now do stemming, if i do the folow:
analyzer = new StandardAnalyzer();
analyzer=AnalyzerUtil.getPorterStemmerAnalyzer(analyzer);
thanks.
Kamal.
--
--
-
To unsubscribe, e-mail: java-user-unsubscr
On Fri, May 08, 2009 at 08:57:59AM -0400, Matthew Hall wrote:
> process your
> words into a more base form before they go into the stemmed
Malaga (http://home.arcor.de/bjoern-beutel/malaga/) can be used to
make a program that converts words to a base form.
--
Ganesh wrote:
My opinion is Stemming process is to get the base word. Here it is not
doing so.
Unfortunately this is where your problem lies, stemming doesn't do this,
it breaks words that are almost lexically equivalent down into a similar
root word. thus cat = cats.
From the
Hello all,
I am using Lucene 2.4.1 and Snowball Analyzer for my indexing.
I am facing some issues with stemming.
Raining stemmed to Rain
cats stemmed to cat
but
Harder is not stemmed to hard
Stronger is not stemmed to Strong.
Even Keyword and Standard analyzer does the same. My opinion is
: Subject: Words that need protection from stemming, i.e., protwords.txt
: References: <49710068.1090...@gmail.com>
: <3994e409-bff0-4348-9d84-4c762b150...@gmail.com>
: <497111f8.7020...@stimulussoft.com>
: In-Reply-To: <497111f8.7020...@stimulussoft.com>
http://pe
Porter is a little outdated I've found KStem much better
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
You'll still need a good protected word list, but KStem is just a little
nicer
On Fri, Jan 16, 2009 at 6:20 PM, David Woodward wrote:
> Hi.
>
> Any good protwords.txt out t
Hi.
Any good protwords.txt out there?
In a fairly standard solr analyzer chain, we use the English Porter analyzer
like so:
For most purposes the porter does just fine, but occasionally words come along
that really don't work out to well, e.g.,
"maine" is stemmed to "main" - clearly goofing
AUTOMATIC REPLY
LUX is closed until 5th January 2009
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
AUTOMATIC REPLY
LUX is closed until 5th January 2009
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
gt; From: Chris Hostetter
> To: java-user@lucene.apache.org
> Sent: Saturday, December 20, 2008 2:12:21 PM
> Subject: Re: Inquiry on Lucene Stemming
>
>
> : Well some client inquiries if it's possible to expand such simple words
> : and does Lucene have an API for this
: Well some client inquiries if it's possible to expand such simple words
: and does Lucene have an API for this logic? Because all I read was the
: stemming logic for Lucene was the other way around which is, example
: "flashing" it will be trimmed to the root word "
1 - 100 of 186 matches
Mail list logo