Please do file a Jira. I'm sure the discussion will be interesting.
-- Jack Krupansky
-Original Message-
From: Jamie
Sent: Monday, June 9, 2014 9:33 AM
To: java-user@lucene.apache.org
Subject: Re: searching with stemming
Jack
Thanks. I figured as much.
I'm modifying each analyzer wi
Hi Tim.
Thanks for your help. I had a friend provide me some code (some snippets
below) that could dump the supposed matching spans (this provided some more
insight). Perhaps, some of my findings could help someone potentially fix the
bug.
So, I added my 2 documents
public static String []
Darin,
I confirmed the behavior you reported. This is probably the same bug that
was reported in LUCENE-5331. The trigger there seems to be multiple examples of
the same token (which you have plenty of). I tested with just this:
[[darin fulford]~100 sauthor]!~0,0
darin fulford (non-directio
Hi,
there is a way to make this work (which is the "official way" to do it): Your
application software is already on Lucene 3.6, so why not simply use the
IndexUpgrader class, which is shipped with Lucene 3.6? This class will upgrade
the existing indexes (back to version 1.0) of your users to t
Jack
Thanks. I figured as much.
I'm modifying each analyzer with constructors that take a Stem argument:
public enum Stem { AGGRESSIVE, LIGHT, NONE };
This is obviously, not ideal, 20 or more Lucene classes must be
updated. I now need to maintain each analyzer.
Regards
Jamie
On 2014/0
I find the weak Javadoc even more troubling:
http://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/en/EnglishAnalyzer.html
The real bottom line is that you are expected to "roll your own" in this
area.
Feel free to file a Jira suggesting your suggested improvement.
-
On Mon, Jun 9, 2014 at 10:17 PM, Adrien Grand wrote:
> Hi,
>
> It is not possible to read 2.x indices from Lucene 4, even with a
> custom codec. For instance, Lucene 4 needs to hook into
> SegmentInfos.read to detect old 3.x indices and force the use of the
> Lucene3x codec since these indices don
Hi,
It is not possible to read 2.x indices from Lucene 4, even with a
custom codec. For instance, Lucene 4 needs to hook into
SegmentInfos.read to detect old 3.x indices and force the use of the
Lucene3x codec since these indices don't expose what codec has been
used to write them.
On Mon, Jun 9
Benson. Thanks. I was just hoping to avoid a whole bunch of boilerplate.
On 2014/06/09, 1:07 PM, Benson Margulies wrote:
Analyzer classes are optional; an analyzer is just a factory for a set of
token stream components. you can usually do just fine with an anonymous
class. Or in your case, the o
Analyzer classes are optional; an analyzer is just a factory for a set of
token stream components. you can usually do just fine with an anonymous
class. Or in your case, the only thing different for each language will be
the stop words, so you can have one analyzer class with a language
parameter.
I am not using Solr. I am using the default analyzers...
On 2014/06/09, 12:59 PM, Benson Margulies wrote:
Are you using Solr? If so you are on the wrong mailing list. If not, why do
you need a non-
-anonymous analyzer at all.
On Jun 9, 2014 6:55 AM, "Jamie" wrote:
To me, it seems strange that
Are you using Solr? If so you are on the wrong mailing list. If not, why do
you need a non-
-anonymous analyzer at all.
On Jun 9, 2014 6:55 AM, "Jamie" wrote:
> To me, it seems strange that these default analyzers, don't provide
> constructors that enable one to override stemming, etc?
>
> On 201
To me, it seems strange that these default analyzers, don't provide
constructors that enable one to override stemming, etc?
On 2014/06/09, 12:39 PM, Trejkaz wrote:
On Mon, Jun 9, 2014 at 7:57 PM, Jamie wrote:
Greetings
Our app currently uses language specific analysers (e.g. EnglishAnalyzer,
Hi all.
The inability to read people's existing indexes is essentially the
only thing stopping us upgrading to v4, so we're stuck indefinitely on
v3.6 until we find a way around this issue.
As I understand it, Lucene 4 added the notion of codecs which can
precisely choose how to read and write th
On Mon, Jun 9, 2014 at 7:57 PM, Jamie wrote:
> Greetings
>
> Our app currently uses language specific analysers (e.g. EnglishAnalyzer,
> GermanAnalyzer, etc.). We need an option to disable stemming. What's the
> recommended way to do this? These analyzers do not include an option to
> disable stem
Benson
Yes, I can of course do this, as far I can see I would have to override
each analyzer. This is a pain.
Regards
Jamie
On 2014/06/09, 12:29 PM, Benson Margulies wrote:
You should construct an analysis chain that does what you need. Read the
source of the relevant analyzer and pick the to
You should construct an analysis chain that does what you need. Read the
source of the relevant analyzer and pick the tokenizer and filter(s) that
you need, and don't include stemming.
On Mon, Jun 9, 2014 at 5:57 AM, Jamie wrote:
> Greetings
>
> Our app currently uses language specific analyser
Greetings
Our app currently uses language specific analysers (e.g.
EnglishAnalyzer, GermanAnalyzer, etc.). We need an option to disable
stemming. What's the recommended way to do this? These analyzers do not
include an option to disable stemming, only a parameter to specify a
list words for w
18 matches
Mail list logo