Thanks Matt. Thanks Paul. I'm up early (PST) and ready for a major
rewrite of my indexer. I think these changes are going to make a huge
difference.
Cheers,
Phil
On Fri, Jul 31, 2009 at 5:52 AM, Matthew Hall wrote:
> And to address the stop word issue, you can override the stop word list that
> i
And to address the stop word issue, you can override the stop word list
that it uses.
Most analyzers that use stop words, (Standard included) has an option to
pass it an arbitrary list of StopWords which will override the defaults.
You could also just roll your own (which is what you are goin
Phil Whelan wrote:
It seems I have to use the same Analyzer for the all the fields in the
index?
Nope. Look at PerFieldAnalyzerWrapper, which is effectively a Map of
field names -> analyzers. This might help if different fields will have
very different values and semantics.
Cheers,
Paul
-
Hi Matthew / Paul,
On Thu, Jul 30, 2009 at 4:32 PM, Paul Cowan wrote:
> Matthew Hall wrote:
>>
>> Place a delimiter between the email addresses that doesn't get removed in
>> your analyzer. (preferably something you know will never be searched on)
>
> Or add them separately (rather than:
> doc.a
Matthew Hall wrote:
Place a delimiter between the email addresses that doesn't get removed
in your analyzer. (preferably something you know will never be searched
on)
Or add them separately (rather than:
doc.add(new Field("email", "f...@bar.com b...@foo.com c...@bar.foo" ...);
use
doc.add
Place a delimiter between the email addresses that doesn't get removed
in your analyzer. (preferably something you know will never be searched on)
That way you can ensure that each email matches independently of each other.
So something like
f...@bar.com DELIM123 b...@foo.com DELIM123 c...@ba
On Thu, Jul 30, 2009 at 11:22 AM, Matthew Hall
wrote:
>
> 1. Sure, just have an analyzer that splits on all non letter characters.
> 2. Phrase queries keep the order intact. (And yes, the positional
> information for the terms is kept, which is what allows span queries to work)
>
> So searching
1. Sure, just have an analyzer that splits on all non letter characters.
2. Phrase queries keep the order intact. (And yes, the positional
information for the terms is kept, which is what allows span queries to
work)
So searching on the following "foo bar com" will match f...@bar.com but
not