Re: searching with stemming

2014-06-09 Thread Jack Krupansky
Please do file a Jira. I'm sure the discussion will be interesting. -- Jack Krupansky -Original Message- From: Jamie Sent: Monday, June 9, 2014 9:33 AM To: java-user@lucene.apache.org Subject: Re: searching with stemming Jack Thanks. I figured as much. I'm modifying each analyzer wi

Re: SpanQuery not working as expected

2014-06-09 Thread Darin McBeath
Hi Tim. Thanks for your help.  I had a friend provide me some code (some snippets below) that could dump the supposed matching spans (this provided some more insight).  Perhaps, some of my findings could help someone potentially fix the bug. So, I added my 2 documents public static String []

RE: SpanQuery not working as expected

2014-06-09 Thread Allison, Timothy B.
Darin, I confirmed the behavior you reported. This is probably the same bug that was reported in LUCENE-5331. The trigger there seems to be multiple examples of the same token (which you have plenty of). I tested with just this: [[darin fulford]~100 sauthor]!~0,0 darin fulford (non-directio

RE: Reading a v2 index in v4

2014-06-09 Thread Uwe Schindler
Hi, there is a way to make this work (which is the "official way" to do it): Your application software is already on Lucene 3.6, so why not simply use the IndexUpgrader class, which is shipped with Lucene 3.6? This class will upgrade the existing indexes (back to version 1.0) of your users to t

Re: searching with stemming

2014-06-09 Thread Jamie
Jack Thanks. I figured as much. I'm modifying each analyzer with constructors that take a Stem argument: public enum Stem { AGGRESSIVE, LIGHT, NONE }; This is obviously, not ideal, 20 or more Lucene classes must be updated. I now need to maintain each analyzer. Regards Jamie On 2014/0

Re: searching with stemming

2014-06-09 Thread Jack Krupansky
I find the weak Javadoc even more troubling: http://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/en/EnglishAnalyzer.html The real bottom line is that you are expected to "roll your own" in this area. Feel free to file a Jira suggesting your suggested improvement. -

Re: Reading a v2 index in v4

2014-06-09 Thread Trejkaz
On Mon, Jun 9, 2014 at 10:17 PM, Adrien Grand wrote: > Hi, > > It is not possible to read 2.x indices from Lucene 4, even with a > custom codec. For instance, Lucene 4 needs to hook into > SegmentInfos.read to detect old 3.x indices and force the use of the > Lucene3x codec since these indices don

Re: Reading a v2 index in v4

2014-06-09 Thread Adrien Grand
Hi, It is not possible to read 2.x indices from Lucene 4, even with a custom codec. For instance, Lucene 4 needs to hook into SegmentInfos.read to detect old 3.x indices and force the use of the Lucene3x codec since these indices don't expose what codec has been used to write them. On Mon, Jun 9

Re: searching with stemming

2014-06-09 Thread Jamie
Benson. Thanks. I was just hoping to avoid a whole bunch of boilerplate. On 2014/06/09, 1:07 PM, Benson Margulies wrote: Analyzer classes are optional; an analyzer is just a factory for a set of token stream components. you can usually do just fine with an anonymous class. Or in your case, the o

Re: searching with stemming

2014-06-09 Thread Benson Margulies
Analyzer classes are optional; an analyzer is just a factory for a set of token stream components. you can usually do just fine with an anonymous class. Or in your case, the only thing different for each language will be the stop words, so you can have one analyzer class with a language parameter.

Re: searching with stemming

2014-06-09 Thread Jamie
I am not using Solr. I am using the default analyzers... On 2014/06/09, 12:59 PM, Benson Margulies wrote: Are you using Solr? If so you are on the wrong mailing list. If not, why do you need a non- -anonymous analyzer at all. On Jun 9, 2014 6:55 AM, "Jamie" wrote: To me, it seems strange that

Re: searching with stemming

2014-06-09 Thread Benson Margulies
Are you using Solr? If so you are on the wrong mailing list. If not, why do you need a non- -anonymous analyzer at all. On Jun 9, 2014 6:55 AM, "Jamie" wrote: > To me, it seems strange that these default analyzers, don't provide > constructors that enable one to override stemming, etc? > > On 201

Re: searching with stemming

2014-06-09 Thread Jamie
To me, it seems strange that these default analyzers, don't provide constructors that enable one to override stemming, etc? On 2014/06/09, 12:39 PM, Trejkaz wrote: On Mon, Jun 9, 2014 at 7:57 PM, Jamie wrote: Greetings Our app currently uses language specific analysers (e.g. EnglishAnalyzer,

Reading a v2 index in v4

2014-06-09 Thread Trejkaz
Hi all. The inability to read people's existing indexes is essentially the only thing stopping us upgrading to v4, so we're stuck indefinitely on v3.6 until we find a way around this issue. As I understand it, Lucene 4 added the notion of codecs which can precisely choose how to read and write th

Re: searching with stemming

2014-06-09 Thread Trejkaz
On Mon, Jun 9, 2014 at 7:57 PM, Jamie wrote: > Greetings > > Our app currently uses language specific analysers (e.g. EnglishAnalyzer, > GermanAnalyzer, etc.). We need an option to disable stemming. What's the > recommended way to do this? These analyzers do not include an option to > disable stem

Re: searching with stemming

2014-06-09 Thread Jamie
Benson Yes, I can of course do this, as far I can see I would have to override each analyzer. This is a pain. Regards Jamie On 2014/06/09, 12:29 PM, Benson Margulies wrote: You should construct an analysis chain that does what you need. Read the source of the relevant analyzer and pick the to

Re: searching with stemming

2014-06-09 Thread Benson Margulies
You should construct an analysis chain that does what you need. Read the source of the relevant analyzer and pick the tokenizer and filter(s) that you need, and don't include stemming. On Mon, Jun 9, 2014 at 5:57 AM, Jamie wrote: > Greetings > > Our app currently uses language specific analyser

searching with stemming

2014-06-09 Thread Jamie
Greetings Our app currently uses language specific analysers (e.g. EnglishAnalyzer, GermanAnalyzer, etc.). We need an option to disable stemming. What's the recommended way to do this? These analyzers do not include an option to disable stemming, only a parameter to specify a list words for w