Re: indexing multiple email addresses in one field

2009-07-31 Thread Phil Whelan
Thanks Matt. Thanks Paul. I'm up early (PST) and ready for a major rewrite of my indexer. I think these changes are going to make a huge difference. Cheers, Phil On Fri, Jul 31, 2009 at 5:52 AM, Matthew Hall wrote: > And to address the stop word issue, you can override the stop word list that > i

Re: indexing multiple email addresses in one field

2009-07-31 Thread Matthew Hall
And to address the stop word issue, you can override the stop word list that it uses. Most analyzers that use stop words, (Standard included) has an option to pass it an arbitrary list of StopWords which will override the defaults. You could also just roll your own (which is what you are goin

Re: indexing multiple email addresses in one field

2009-07-30 Thread Paul Cowan
Phil Whelan wrote: It seems I have to use the same Analyzer for the all the fields in the index? Nope. Look at PerFieldAnalyzerWrapper, which is effectively a Map of field names -> analyzers. This might help if different fields will have very different values and semantics. Cheers, Paul -

Re: indexing multiple email addresses in one field

2009-07-30 Thread Phil Whelan
Hi Matthew / Paul, On Thu, Jul 30, 2009 at 4:32 PM, Paul Cowan wrote: > Matthew Hall wrote: >> >> Place a delimiter between the email addresses that doesn't get removed in >> your analyzer.  (preferably something you know will never be searched on) > > Or add them separately (rather than: >  doc.a

Re: indexing multiple email addresses in one field

2009-07-30 Thread Paul Cowan
Matthew Hall wrote: Place a delimiter between the email addresses that doesn't get removed in your analyzer. (preferably something you know will never be searched on) Or add them separately (rather than: doc.add(new Field("email", "f...@bar.com b...@foo.com c...@bar.foo" ...); use doc.add

Re: indexing multiple email addresses in one field

2009-07-30 Thread Matthew Hall
Place a delimiter between the email addresses that doesn't get removed in your analyzer. (preferably something you know will never be searched on) That way you can ensure that each email matches independently of each other. So something like f...@bar.com DELIM123 b...@foo.com DELIM123 c...@ba

Re: indexing multiple email addresses in one field

2009-07-30 Thread Phil Whelan
On Thu, Jul 30, 2009 at 11:22 AM, Matthew Hall wrote: > > 1. Sure, just have an analyzer that splits on all non letter characters. > 2. Phrase queries keep the order intact.  (And yes, the positional > information for the terms is kept, which is what allows span queries to work) > > So searching

Re: indexing multiple email addresses in one field

2009-07-30 Thread Matthew Hall
1. Sure, just have an analyzer that splits on all non letter characters. 2. Phrase queries keep the order intact. (And yes, the positional information for the terms is kept, which is what allows span queries to work) So searching on the following "foo bar com" will match f...@bar.com but not