Hi Matthew / Paul, On Thu, Jul 30, 2009 at 4:32 PM, Paul Cowan<co...@aconex.com> wrote: > Matthew Hall wrote: >> >> Place a delimiter between the email addresses that doesn't get removed in >> your analyzer. (preferably something you know will never be searched on) > > Or add them separately (rather than: > doc.add(new Field("email", "f...@bar.com b...@foo.com c...@bar.foo" ...); > use > doc.add(new Field("email", "f...@bar.com"); > doc.add(new Field("email", "b...@foo.com"); > doc.add(new Field("email", "c...@bar.foo"); > ), using an Analyzer that overrides getPositionIncrementGap(). This inserts > a 'gap' between each set of Tokens for the same Field, which stops phrase > queries from 'crossing the boundaries' between subsequent values.
I like the sound of that! I think I understand it. getPositionIncrementGap() returns 0 by default which keeps the "email" field tokens sequential. Overriding with 1, will add an effective blank token between the email addresses (overriding with 2 would leave 2). Similar to Matthew's delimiter token, but a bit neater. So the token (with positions in brackets) would look something like this. "foo(0) bar(1) com(2) bar(4) foo(5) com(6) com(8) bar(9) foo(10)" Up until now I've only been using the WhiteSpaceAnalyzer, as I've been keeping quite a tight control over the fields going into the index (not making best use of Lucene). What Analyzer would you recommend I use for this. I'll also be indexing IPs, and other things, but that's pretty much the same story. It seems I have to use the same Analyzer for the all the fields in the index? I've been looking at StandardAnalyzer, but I do not want to remove stop words. I want to keep letters and numbers mainly, and also override getPositionIncrementGap? Is there anything that does these things already, or close to it? Overriding getPositionIncrementGap shouldn't be difficult though. Cheers, Phil --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org