Hi Matthew / Paul,

On Thu, Jul 30, 2009 at 4:32 PM, Paul Cowan<co...@aconex.com> wrote:
> Matthew Hall wrote:
>>
>> Place a delimiter between the email addresses that doesn't get removed in
>> your analyzer.  (preferably something you know will never be searched on)
>
> Or add them separately (rather than:
>  doc.add(new Field("email", "f...@bar.com b...@foo.com c...@bar.foo" ...);
> use
>  doc.add(new Field("email", "f...@bar.com");
>  doc.add(new Field("email", "b...@foo.com");
>  doc.add(new Field("email", "c...@bar.foo");
> ), using an Analyzer that overrides getPositionIncrementGap(). This inserts
> a 'gap' between each set of Tokens for the same Field, which stops phrase
> queries from 'crossing the boundaries' between subsequent values.

I like the sound of that! I think I understand it.
getPositionIncrementGap() returns 0 by default which keeps the "email"
field tokens sequential. Overriding with 1, will add an effective
blank token between the email addresses (overriding with 2 would leave
2). Similar to Matthew's delimiter token, but a bit neater.

So the token (with positions in brackets) would look something like this.

"foo(0) bar(1) com(2) bar(4) foo(5) com(6) com(8) bar(9) foo(10)"

Up until now I've only been using the WhiteSpaceAnalyzer, as I've been
keeping quite a tight control over the fields going into the index
(not making best use of Lucene).

What Analyzer would you recommend I use for this. I'll also be
indexing IPs, and other things, but that's pretty much the same story.
It seems I have to use the same Analyzer for the all the fields in the
index?

I've been looking at StandardAnalyzer, but I do not want to remove
stop words. I want to keep letters and numbers mainly, and also
override getPositionIncrementGap? Is there anything that does these
things already, or close to it? Overriding getPositionIncrementGap
shouldn't be difficult though.

Cheers,
Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to