Multiple languages, boosting and, stemming and KeywordRepeat

2018-05-14 Thread Markus Jelsma
Hello, First, apologies for the weird subject line, and apologies for cross-posting, but last week it got no replies on the Solr user mailing list. We index many languages and search over all those languages at once, but boost the language of the user's preference. To differentiate between ste

Re: How to use Hunspell dictionary to do the reverse of stemming ?

2017-10-24 Thread Robert Muir
On Tue, Oct 24, 2017 at 11:04 AM, julien Blaize wrote: > Hello, > > i am lookingfor a way to efficiently do the reverse of stemming. > Example : if i give to the program the verb "drug" it will give me > "drugged', "drugging", "drugs", &qu

How to use Hunspell dictionary to do the reverse of stemming ?

2017-10-24 Thread julien Blaize
Hello, i am lookingfor a way to efficiently do the reverse of stemming. Example : if i give to the program the verb "drug" it will give me "drugged', "drugging", "drugs", "drugstore" etc... I have used the program wordforms from hunspell to gen

Re: Collecting all stemming token

2017-02-03 Thread Xiaolong Zheng
, --Xiaolong On Fri, Feb 3, 2017 at 1:16 PM, Xiaolong Zheng wrote: > Hello, > > I am trying collect stemming changes in my search index during the > indexing time. So I could collect a list of stemmed word -> [variety > original word] (e.g: plot -> [plots, plotting, plotted])

Collecting all stemming token

2017-02-03 Thread Xiaolong Zheng
Hello, I am trying collect stemming changes in my search index during the indexing time. So I could collect a list of stemmed word -> [variety original word] (e.g: plot -> [plots, plotting, plotted]) for a later use. I am using k-stem filter + KeywordRepeatFilter + RemoveDuplicatesTokenFil

Re: Problem with porter stemming

2016-07-19 Thread Dwaipayan Roy
​Hello. I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for the Luke similarity calculation. Luke by default use the DefaultSimilarity. Can​ anyone help with this? I use Lucene 4.10.4 and Luke for that version of Lucene index. Dwaipayan

Re: Problem with porter stemming

2016-03-14 Thread Benson Margulies
Stemming is an inherently limited process. It doesn't know about the word 'news', it just has a rule about 's'. Some of us sell commercial products that do more complex linguistic processing that knows about which words are which. There may be open source implementati

Re: Problem with porter stemming

2016-03-14 Thread Ahmet Arslan
Hi Dwaipayan, Another way is to use KeywordMarkerFilter. Stemmer implementations respect this attribute. If you want to supply your own mappings, StemmerOverrideTokenFilter could be used as well. ahmet On Monday, March 14, 2016 4:31 PM, Dwaipayan Roy wrote: ​I am using EnglishAnalyzer wi

RE: Problem with porter stemming

2016-03-14 Thread Markus Jelsma
.org > Subject: Problem with porter stemming > > ​I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses > the porter stemmer (snowball) to stem the words. But using the > EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is >

Problem with porter stemming

2016-03-14 Thread Dwaipayan Roy
​I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses the porter stemmer (snowball) to stem the words. But using the EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is getting stemmed into 'new'. Any help would be appreciated.

RE: Preserve Original Option In Stemming (EnglishMinimalStemFilterFactory).

2015-08-25 Thread Uwe Schindler
AM, Modassar Ather > wrote: > > Can > > anyone tell me why this option is not provided for Stemming. > > > > I am not sure about it but the original token can be preserved by > > using too. > > To avoid any duplicate token in the document > class=&quo

Re: Preserve Original Option In Stemming (EnglishMinimalStemFilterFactory).

2015-08-25 Thread Erick Erickson
FilterFactory, or use a copyField that doesn't stem and when exact matches are required, search on that field. Best, Erick On Tue, Aug 25, 2015 at 5:05 AM, Modassar Ather wrote: > Can > anyone tell me why this option is not provided for Stemming. > > I am not sure about it but

Re: Preserve Original Option In Stemming (EnglishMinimalStemFilterFactory).

2015-08-25 Thread Modassar Ather
Can anyone tell me why this option is not provided for Stemming. I am not sure about it but the original token can be preserved by using too. To avoid any duplicate token in the document can be used at the end of analysis chain. Hope this helps. Regards, Modassar On Tue, Aug 25, 2015 at 2:12

Preserve Original Option In Stemming (EnglishMinimalStemFilterFactory).

2015-08-25 Thread Vishnu Mishra
anyone tell me why this option is not provided for Stemming. For e.g. if I want to store both *Methods* and *Method* in my index then I think there is no option is available in Lucene to do this. I also noticed that if we place EnglishMinimalStemFilterFactory after WordDelimiterFilterFactory with

Re: stemming irregular plurals?

2014-07-29 Thread Rob Nikander
Ah, yes, that does it. Thank you both. Rob On Jul 29, 2014, at 10:30 AM, Alexandre Patry wrote: > > On 29/07/2014 10:28, Rob Nikander wrote: >> Mmm. I don’t see a way to construct one, except passing an FST, which isn’t >> exactly a map. I look at the FST javadoc; it’s a rabbit hole. > You

Re: stemming irregular plurals?

2014-07-29 Thread Alexandre Patry
On 29/07/2014 10:28, Rob Nikander wrote: Mmm. I don’t see a way to construct one, except passing an FST, which isn’t exactly a map. I look at the FST javadoc; it’s a rabbit hole. You probably want to look at http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscel

Re: stemming irregular plurals?

2014-07-29 Thread Rob Nikander
Mmm. I don’t see a way to construct one, except passing an FST, which isn’t exactly a map. I look at the FST javadoc; it’s a rabbit hole. Rob On Jul 29, 2014, at 10:14 AM, Robert Muir wrote: > You can put this thing before your stemmer, with a custom map of exceptions: > > http://lucene.apach

Re: stemming irregular plurals?

2014-07-29 Thread Robert Muir
You can put this thing before your stemmer, with a custom map of exceptions: http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/StemmerOverrideFilter.html On Tue, Jul 29, 2014 at 10:03 AM, Robert Nikander wrote: > Hi, > > I created an Analyzer with a Po

stemming irregular plurals?

2014-07-29 Thread Robert Nikander
Hi, I created an Analyzer with a PorterStemFilter, and I’m searching some test documents. Normal plurals work; searching for “zebra” finds text with “zebras”. But searching for “goose” doesn’t find “geese”. Is that expected? Does it give up on irregular English? Is there a way to make that

Re: searching with stemming

2014-06-10 Thread Jamie
Done. https://issues.apache.org/jira/browse/LUCENE-5749 On 2014/06/10, 1:18 AM, Jack Krupansky wrote: Please do file a Jira. I'm sure the discussion will be interesting. -- Jack Krupansky - To unsubscribe, e-mail: java-user-

Re: searching with stemming

2014-06-09 Thread Jack Krupansky
Please do file a Jira. I'm sure the discussion will be interesting. -- Jack Krupansky -Original Message- From: Jamie Sent: Monday, June 9, 2014 9:33 AM To: java-user@lucene.apache.org Subject: Re: searching with stemming Jack Thanks. I figured as much. I'm modifying eac

Re: searching with stemming

2014-06-09 Thread Jamie
to file a Jira suggesting your suggested improvement. -- Jack Krupansky -Original Message- From: Jamie Sent: Monday, June 9, 2014 6:56 AM To: java-user@lucene.apache.org Subject: Re: searching with stemming To me, it seems strange that these default analyzers, don't provide construc

Re: searching with stemming

2014-06-09 Thread Jack Krupansky
mprovement. -- Jack Krupansky -Original Message- From: Jamie Sent: Monday, June 9, 2014 6:56 AM To: java-user@lucene.apache.org Subject: Re: searching with stemming To me, it seems strange that these default analyzers, don't provide constructors that enable one to override stemming, e

Re: searching with stemming

2014-06-09 Thread Jamie
Benson. Thanks. I was just hoping to avoid a whole bunch of boilerplate. On 2014/06/09, 1:07 PM, Benson Margulies wrote: Analyzer classes are optional; an analyzer is just a factory for a set of token stream components. you can usually do just fine with an anonymous class. Or in your case, the o

Re: searching with stemming

2014-06-09 Thread Benson Margulies
t;> -anonymous analyzer at all. >> On Jun 9, 2014 6:55 AM, "Jamie" wrote: >> >> To me, it seems strange that these default analyzers, don't provide >>> constructors that enable one to override stemming, etc? >>> >>> On 2014/0

Re: searching with stemming

2014-06-09 Thread Jamie
ms strange that these default analyzers, don't provide constructors that enable one to override stemming, etc? On 2014/06/09, 12:39 PM, Trejkaz wrote: On Mon, Jun 9, 2014 at 7:57 PM, Jamie wrote: Greetings Our app currently uses language specific analysers (e.g. EnglishAnalyzer, GermanAnal

Re: searching with stemming

2014-06-09 Thread Benson Margulies
Are you using Solr? If so you are on the wrong mailing list. If not, why do you need a non- -anonymous analyzer at all. On Jun 9, 2014 6:55 AM, "Jamie" wrote: > To me, it seems strange that these default analyzers, don't provide > constructors that enable one to override

Re: searching with stemming

2014-06-09 Thread Jamie
To me, it seems strange that these default analyzers, don't provide constructors that enable one to override stemming, etc? On 2014/06/09, 12:39 PM, Trejkaz wrote: On Mon, Jun 9, 2014 at 7:57 PM, Jamie wrote: Greetings Our app currently uses language specific analysers (e.g. EnglishAna

Re: searching with stemming

2014-06-09 Thread Trejkaz
On Mon, Jun 9, 2014 at 7:57 PM, Jamie wrote: > Greetings > > Our app currently uses language specific analysers (e.g. EnglishAnalyzer, > GermanAnalyzer, etc.). We need an option to disable stemming. What's the > recommended way to do this? These analyzers do not include an

Re: searching with stemming

2014-06-09 Thread Jamie
tokenizer and filter(s) that you need, and don't include stemming. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: searching with stemming

2014-06-09 Thread Benson Margulies
You should construct an analysis chain that does what you need. Read the source of the relevant analyzer and pick the tokenizer and filter(s) that you need, and don't include stemming. On Mon, Jun 9, 2014 at 5:57 AM, Jamie wrote: > Greetings > > Our app currently uses lan

searching with stemming

2014-06-09 Thread Jamie
Greetings Our app currently uses language specific analysers (e.g. EnglishAnalyzer, GermanAnalyzer, etc.). We need an option to disable stemming. What's the recommended way to do this? These analyzers do not include an option to disable stemming, only a parameter to specify a list word

Re: RE: Stemming and Wildcard - or fire and water

2013-01-04 Thread Trejkaz
gt; A simple flag which allows or suppresses the stemming would solve everyones > problem. Stemming isn't done by default, so... problem already solved then? TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apa

AW: RE: Stemming and Wildcard - or fire and water

2013-01-04 Thread Klaus Nesbigall
I've encountered the same problem and tried to use your workaround. But overwriting the parser hasn't done the job. I do not understand why the stemming is done anyway. Uwe wrote > This is a well-known problem: Wildcards cannot be analyzed by the query > parser, because th

Re: Looking for example code: Tokenizer + Analyzer for Russian stemming

2012-12-19 Thread Steve Rowe
r (in particular > org.apache.lucene.analysis.ru.RussianAnalyzer) for standalone stemming. > Can't find such an example here: > http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/package-summary.html?is-external=true#package_description > > Thanks!

Looking for example code: Tokenizer + Analyzer for Russian stemming

2012-12-18 Thread dokondr
> Hello, > I am looking for an example of using Tokenizer + Analyzer (in particular > org.apache.lucene.analysis.ru.RussianAnalyzer) for standalone stemming. > Can't find such an example here: > > http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/package-s

RE: Stemming and Wildcard - or fire and water

2012-12-11 Thread Lars-Erik Aabech
A possible workaround could be to modify search terms with wildcard tokens by stemming them manually and creating a new search string. Searches for hersen* would be modified to hers* and return what you expect. Con is of course that you search for more than you specified. Lars-Erik

RE: Stemming and Wildcard - or fire and water

2012-12-11 Thread Uwe Schindler
This is a well-known problem: Wildcards cannot be analyzed by the query parser, because the analysis would destroy the wildcard characters; also stemming of parts of terms will never work. For Solr there is a workaround (MultiTermAware component), but it is also very limited and only works when

Stemming and Wildcard - or fire and water

2012-12-11 Thread Bayer Dennis
Hello there, my colleague and I ran into an example which didn't return the result size which we were expecting. We discovered that there is a mismatch in handling terms while indexing and searching. This issue is already discussed several times in the internet as we found out later on, but in o

Re: Stemming - limited index expansion

2012-06-12 Thread Jack Krupansky
Krupansky -Original Message- From: Paul Hill Sent: Tuesday, June 12, 2012 7:43 PM To: java-user@lucene.apache.org Subject: RE: Stemming - limited index expansion Thanks for the reply. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Tuesday, June 1

RE: Stemming - limited index expansion

2012-06-12 Thread Paul Hill
Thanks for the reply. > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Tuesday, June 12, 2012 1:14 PM > To: java-user@lucene.apache.org > Subject: Re: Stemming - limited index expansion > > I don't completely follow precisel

Re: Stemming - limited index expansion

2012-06-12 Thread Jack Krupansky
Jack Krupansky -Original Message- From: Paul Hill Sent: Tuesday, June 12, 2012 3:07 PM To: java-user@lucene.apache.org Subject: Stemming - limited index expansion As others have previously proposed on this list, I am interesting in inserting a second token at some positions in my ind

Stemming - limited index expansion

2012-06-12 Thread Paul Hill
As others have previously proposed on this list, I am interesting in inserting a second token at some positions in my index. I'll call this Limited Index Expansion. I want to retain the original token, so that I can score an original word that matches in a text better than just any synonym/stem

Re: Best practice for stemming and exact matching

2011-04-01 Thread Christopher Condit
r.getFieldQuery(field, queryText, quoted); > } > > if you are using perfieldanalyzerwrapper so that these "_exact" fields > don't use stemming, then it should just work. > you'll need branch_3.x or trunk for this, but looks like 3.1 is close Using the 3.1 jars fr

Re: Best practice for stemming and exact matching

2011-03-29 Thread Robert Muir
queryText, quoted); } if you are using perfieldanalyzerwrapper so that these "_exact" fields don't use stemming, then it should just work. you'll need branch_3.x or trunk for this, but looks like 3.1 is close - To unsu

Best practice for stemming and exact matching

2011-03-29 Thread Christopher Condit
I have Lucene indexes build using a shingled, stemmed custom analyzer. I have a new requirement that exact searches match correctly. ie: bar AND "nachos" will only fetch results with plural nachos. Right now, with the stemming, singular nacho results are returned as well. I realize that

Re: Stemming and Wildcard Queries

2010-05-21 Thread Erick Erickson
Another approach to stemming at index time but still providing exact matches when requested is to index the stemmed version AND the original version at the same position (think synonyms). But here's the trick, index the original token with a special character. For instance, indexing &qu

Re: Stemming and Wildcard Queries

2010-05-21 Thread Ivan Provalov
Thanks, everyone! --- On Thu, 5/20/10, Herbert Roitblat wrote: > From: Herbert Roitblat > Subject: Re: Stemming and Wildcard Queries > To: java-user@lucene.apache.org > Date: Thursday, May 20, 2010, 4:48 PM > At a general level, we have found > that stemming during indexin

Re: Stemming and Wildcard Queries

2010-05-20 Thread Herbert Roitblat
At a general level, we have found that stemming during indexing is not advisable. Sometimes users want the exact form and if you have removed the exact form during indexing, obviously, you cannot provide that. Rather, we have found that stemming during search is more useful, or maybe it

Re: Stemming and Wildcard Queries

2010-05-20 Thread Ahmet Arslan
> Is there a good way to combine the > wildcard queries and stemming?  > > As is, the field which is stemmed at index time, won't work > with some wildcard queries. org.apache.lucene.queryParser.analyzing.Analyzing

Stemming and Wildcard Queries

2010-05-20 Thread Ivan Provalov
Is there a good way to combine the wildcard queries and stemming? As is, the field which is stemmed at index time, won't work with some wildcard queries. We were thinking to create two separate index fields - one stemmed, one non-stemmed, but we are having issues with our SpanNear qu

Re: Stemming Problem

2010-05-19 Thread Larry Hendrix
; HTH > Erick > > On Tue, May 18, 2010 at 2:05 PM, Larry Hendrix wrote: > >> Hi, >> >> Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having >> problems with stemming. Does anyone have a recommendation for other text &

Re: Stemming Problem

2010-05-18 Thread Erick Erickson
now I'm using Lucene with a basic Whitespace Anayzer but I'm having > problems with stemming. Does anyone have a recommendation for other text > analyzers that handle stemming and also keep capitalization, stop words, and > punctuation? > > Thanks, > Larry > >

RE: Stemming Problem

2010-05-18 Thread Christopher Condit
Hi Larry- > Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having > problems with stemming. Does anyone have a recommendation for other > text analyzers that handle stemming and also keep capitalization, stop words, > and punctuation? Have you tried t

Stemming Problem

2010-05-18 Thread Larry Hendrix
Hi, Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having problems with stemming. Does anyone have a recommendation for other text analyzers that handle stemming and also keep capitalization, stop words, and punctuation? Thanks, Larry Larry A. Hendrix, Gradua

Re: Contrib Lucene Analyzers & Stemming

2010-02-10 Thread Robert Muir
ere > > We are having problems with some of the Lucene analyzers in the > contributions package. For instance, it appears that the Russian analyzer > supports stemming, although, when we test it it does not. Is there a > specific switch that we must enable to enable the stemming of word

Contrib Lucene Analyzers & Stemming

2010-02-10 Thread Jamie
Hi There We are having problems with some of the Lucene analyzers in the contributions package. For instance, it appears that the Russian analyzer supports stemming, although, when we test it it does not. Is there a specific switch that we must enable to enable the stemming of words? When we

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-11 Thread KK
Thank you very much Yonik. I downloaded the latest Solr build, pulled the WordDelimiterFilter and used it with the same option as used by Solr default and it worked like a charm. Thanks to Robert also. Thanks, KK On Tue, Jun 9, 2009 at 7:01 PM, Yonik Seeley wrote: > I just cut'n'pasted your wor

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-11 Thread KK
Note: I request Solr users to go through this mail and let me thier ideas. Thanks Yonik, you rightly pointed it out. That clearly says that the way I'm trying to mimic the default behaviour of Solr indexing/searching in Lucene is wrong, right?. I downloaded the latest version of solr nightly on m

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-09 Thread Yonik Seeley
I just cut'n'pasted your word into Solr... it worked fine (it didn't split the word). Make sure you're using the latest from the trunk version of Solr... this was fixed since 1.3 http://localhost:8983/solr/select?q=साल&debugQuery=true [...] साल साल text:साल text:साल -Yonik On Tue, Jun

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-09 Thread KK
7890cd73b193cefed83c283339089 > >> >> >> and say that there also its specified that we've to mention the > >> >> parameters > >> >> >> and both are different for indexing and querying. > >> >> >> I'm kind of stuc

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-08 Thread Robert Muir
;> >> >> worddelimiterfilterfactory, right? I don't know whats the use of the >> >> other >> >> >> one. Anyway can you guide me getting rid of the above error. And yes >> >> I'll >> >> >> change the order of applying

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-07 Thread KK
erfactory, right? I don't know whats the use of the > >> other > >> >> one. Anyway can you guide me getting rid of the above error. And yes > >> I'll > >> >> change the order of applying the filters as you said. > >> >> > >

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-06 Thread Robert Muir
;> >> >> >> >> >> >> On Fri, Jun 5, 2009 at 5:48 PM, Robert Muir wrote: >> >> >> >> > KK, you got the right idea. >> >> > >> >> > though I think you might want to change the order, m

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-06 Thread KK
ght not work > correctly. > >> > > >> > On Fri, Jun 5, 2009 at 8:05 AM, KK > wrote: > >> > > >> > > Thanks Robert. This is exactly what I did and its working but > delimiter > >> > is > >> > > missing I'm going to

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-05 Thread KK
nge the order, move the stopfilter > >> > before the porter stem filter... otherwise it might not work > correctly. > >> > > >> > On Fri, Jun 5, 2009 at 8:05 AM, KK > wrote: > >> > > >> > > Thanks Robert. This is exactly what I d

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-05 Thread Robert Muir
009 at 8:05 AM, KK wrote: >> > >> > > Thanks Robert. This is exactly what I did and  its working but delimiter >> > is >> > > missing I'm going to add that from solr-nightly.jar >> > > >> > > /** >> > >  * Analyzer for

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-05 Thread Robert Muir
y.jar > > > > > > /** > > >  * Analyzer for Indian language. > > >  */ > > > public class IndicAnalyzer extends Analyzer { > > >  public TokenStream tokenStream(String fieldName, Reader reader) { > > >     TokenStream ts = new Whitespac

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-05 Thread KK
izer(reader); > >ts = new PorterStemFilter(ts); > >ts = new LowerCaseFilter(ts); > >ts = new StopFilter(ts, StopAnalyzer.ENGLISH_STOP_WORDS); > >return ts; > > } > > } > > > > Its able to do stemming/case-folding and supports search for bot

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-05 Thread Robert Muir
Tokenizer(reader); >ts = new PorterStemFilter(ts); >ts = new LowerCaseFilter(ts); >ts = new StopFilter(ts, StopAnalyzer.ENGLISH_STOP_WORDS); >return ts; > } > } > > Its able to do stemming/case-folding and supports search for both english > and indic texts. let me try o

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-05 Thread KK
eader) { TokenStream ts = new WhitespaceTokenizer(reader); ts = new PorterStemFilter(ts); ts = new LowerCaseFilter(ts); ts = new StopFilter(ts, StopAnalyzer.ENGLISH_STOP_WORDS); return ts; } } Its able to do stemming/case-folding and supports search for both english and indic texts. let me tr

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-05 Thread Robert Muir
to me Solr:Lucene is similar to > > > > > Window$:Linux, its my view only, though]. Coming back to the point > as > > > Uwe > > > > > mentioned that we can do the same thing in lucene as well, what is > > > > > available > > > > >

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-05 Thread KK
> Uwe > > > > mentioned that we can do the same thing in lucene as well, what is > > > > available > > > > in Solr, Solr is based on Lucene only, right? > > > > I request Uwe to give me some more ideas on using the analyzers from > > solr > &

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread KK
first via whitespaceanalyzer, then via analyzer for lowercasing, then analyzer for stemming[do we have such analyzers or are they filters?I think they are filters, then I've to go for custom analyzer]. As you said whitespace one will act on the full content, lowercasing will aplly only on t

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread Robert Muir
ng a mix of both english and > > non-english > > > content. > > > Muir, can you give me a bit detail description of how to use the > > > WordDelimiteFilter to do my job. > > > On a side note, I was thingking of writing a simple analyzer that will > d

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread KK
t. > > Muir, can you give me a bit detail description of how to use the > > WordDelimiteFilter to do my job. > > On a side note, I was thingking of writing a simple analyzer that will do > > the following, > > #. If the webpage fragment is non-english[for me its some i

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread Robert Muir
andard Lucene analyzers. So you > can drop the solr core jar into your project and just use them :-) > > Currently I am not sure which one is the analyzer Robert means, that can do > english stemming and detecting non-english parts, but there is to look for > it. > > Uwe

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread Robert Muir
h and non-english > content. > Muir, can you give me a bit detail description of how to use the > WordDelimiteFilter to do my job. > On a side note, I was thingking of writing a simple analyzer that will do > the following, > #. If the webpage fragment is non-english[for me its some india

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread KK
che.org/solr/api/org/apache/solr/analysis/package-summary.h%0Atml> > > As you see, the Solr analyzers are just standard Lucene analyzers. So you > can drop the solr core jar into your project and just use them :-) > > Currently I am not sure which one is the analyzer Robert mean

RE: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread Uwe Schindler
yzers are just standard Lucene analyzers. So you can drop the solr core jar into your project and just use them :-) Currently I am not sure which one is the analyzer Robert means, that can do english stemming and detecting non-english parts, but there is to look for

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread KK
its some indian language] then index them as such, no stemming/ stop word removal to begin with. As I know its in UCN unicode something like \u0021\u0012\u34ae\u0031[just a sample] # If the fragment is english then apply standard anlyzing process for english content. I've not thought of quering i

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread Robert Muir
hetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Robert Muir [mailto:rcm...@gmail.com] > > Sent: Thursday, June 04, 2009 1:18 PM > > To: java-user@lucene.apache.org > > Subject: Re: How to support stemming and case folding for en

RE: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread Uwe Schindler
to:rcm...@gmail.com] > Sent: Thursday, June 04, 2009 1:18 PM > To: java-user@lucene.apache.org > Subject: Re: How to support stemming and case folding for english content > mixed with non-english content? > > KK, ok, so you only really want to stem the english. This is good. >

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread Robert Muir
nt to use the basic white space analyzer as we dont have stemmers > for this as I mentioned earlier and whereever english appears I want them > to > be stemmed tokenized etc[the standard process used for english content]. As > of now I'm using whitespace analyzer for the full content

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-03 Thread KK
content]. As of now I'm using whitespace analyzer for the full content which doesnot support case folding, stemming etc for teh content. So if there is an english word say "Detection" indexed as such then searching for detection or detect is not giving any results, which is the expecte

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-03 Thread Robert Muir
content. I must metion that we dont have stemming and case > folding for these non-english content. I'm stuck with this. Some one do let > me know how to proceed for fixing this issue. > > Thanks, > KK. > -- Robert Muir rcm...@gmail.com

How to support stemming and case folding for english content mixed with non-english content?

2009-06-03 Thread KK
nt intermingled with non-english content. I must metion that we dont have stemming and case folding for these non-english content. I'm stuck with this. Some one do let me know how to proceed for fixing this issue. Thanks, KK.

Re: Re: how to do stemming?

2009-05-11 Thread Kamal Najib
Thank you Ian. Kamal Original Message: Yep, I reckon so. btw a Google search for something like lucene stemming gets hits, including a couple of articles about stemming. Might be worth a look. -- Ian. On Mon, May 11, 2009 at 2:08 PM, Kamal Najib wrote: > will the anlyzer now do stemming,

Re: how to do stemming?

2009-05-11 Thread Ian Lea
Yep, I reckon so. btw a Google search for something like lucene stemming gets hits, including a couple of articles about stemming. Might be worth a look. -- Ian. On Mon, May 11, 2009 at 2:08 PM, Kamal Najib wrote: > will the anlyzer now do stemming, if i do the folow: > analyzer

how to do stemming?

2009-05-11 Thread Kamal Najib
will the anlyzer now do stemming, if i do the folow: analyzer = new StandardAnalyzer(); analyzer=AnalyzerUtil.getPorterStemmerAnalyzer(analyzer); thanks. Kamal. -- -- - To unsubscribe, e-mail: java-user-unsubscr

Re: Stemming

2009-05-11 Thread Hannu Väisänen
On Fri, May 08, 2009 at 08:57:59AM -0400, Matthew Hall wrote: > process your > words into a more base form before they go into the stemmed Malaga (http://home.arcor.de/bjoern-beutel/malaga/) can be used to make a program that converts words to a base form. --

Re: Stemming

2009-05-08 Thread Matthew Hall
Ganesh wrote: My opinion is Stemming process is to get the base word. Here it is not doing so. Unfortunately this is where your problem lies, stemming doesn't do this, it breaks words that are almost lexically equivalent down into a similar root word. thus cat = cats. From the

Stemming

2009-05-08 Thread Ganesh
Hello all, I am using Lucene 2.4.1 and Snowball Analyzer for my indexing. I am facing some issues with stemming. Raining stemmed to Rain cats stemmed to cat but Harder is not stemmed to hard Stronger is not stemmed to Strong. Even Keyword and Standard analyzer does the same. My opinion is

Re: Words that need protection from stemming, i.e., protwords.txt

2009-01-21 Thread Chris Hostetter
: Subject: Words that need protection from stemming, i.e., protwords.txt : References: <49710068.1090...@gmail.com> : <3994e409-bff0-4348-9d84-4c762b150...@gmail.com> : <497111f8.7020...@stimulussoft.com> : In-Reply-To: <497111f8.7020...@stimulussoft.com> http://pe

Re: Words that need protection from stemming, i.e., protwords.txt

2009-01-16 Thread patrick o'leary
Porter is a little outdated I've found KStem much better http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem You'll still need a good protected word list, but KStem is just a little nicer On Fri, Jan 16, 2009 at 6:20 PM, David Woodward wrote: > Hi. > > Any good protwords.txt out t

Words that need protection from stemming, i.e., protwords.txt

2009-01-16 Thread David Woodward
Hi. Any good protwords.txt out there? In a fairly standard solr analyzer chain, we use the English Porter analyzer like so: For most purposes the porter does just fine, but occasionally words come along that really don't work out to well, e.g., "maine" is stemmed to "main" - clearly goofing

Re: Re: Re: Inquiry on Lucene Stemming

2008-12-21 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Re: Inquiry on Lucene Stemming

2008-12-21 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Inquiry on Lucene Stemming

2008-12-21 Thread Otis Gospodnetic
gt; From: Chris Hostetter > To: java-user@lucene.apache.org > Sent: Saturday, December 20, 2008 2:12:21 PM > Subject: Re: Inquiry on Lucene Stemming > > > : Well some client inquiries if it's possible to expand such simple words > : and does Lucene have an API for this

Re: Inquiry on Lucene Stemming

2008-12-20 Thread Chris Hostetter
: Well some client inquiries if it's possible to expand such simple words : and does Lucene have an API for this logic? Because all I read was the : stemming logic for Lucene was the other way around which is, example : "flashing" it will be trimmed to the root word "

  1   2   >