One of many options is to copy the StandardAnalyzer but change it so
that + and # are considered letters.
Just add + and # to the LETTER definition in the JavaCC file if you are
using a release, or the JFlex file if you are working off Trunk (your
prob using a release but the new JFlex analyzer is mega faster).
Tom Conlon wrote:
The reason seems to be that I found I needed to implement an analyser that lowercases terms as well as *not* ignoring trailing characters such as #, +.
(i.e. I needed to match C# and C++)
public final class LowercaseWhitespaceAnalyzer extends Analyzer
{
public TokenStream tokenStream(String fieldName, Reader reader) {
return new LowercaseWhitespaceTokenizer(reader);
}
}
Problem now exists that "system," etc is not matched against "system".
Can anyone point to an example of a combination of analyser/tokeniser (or other
method) that gets around this please?
Thanks,
Tom
-----Original Message-----
From: Tom Conlon [mailto:[EMAIL PROTECTED]
Sent: 01 November 2007 09:18
To: java-user@lucene.apache.org
Subject: RE: Hits.score mystery
Thanks Daniel,
I'm using Searcher.explain() & luke to try to understand the reasons for the
score.
-----Original Message-----
From: Daniel Naber [mailto:[EMAIL PROTECTED]
Sent: 01 November 2007 08:19
To: java-user@lucene.apache.org
Subject: Re: Hits.score mystery
On Wednesday 31 October 2007 19:14, Tom Conlon wrote:
119.txt 17.865013 97% (13 occurences) 45.txt 8.600986 47%
(18 occurences)
45.txt might be a document with more therms so that its score is lower although
it contains more matches.
Regards
Daniel
--
http://www.danielnaber.de
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]