Re: Tokenizer for Optional Keywords

Chris Hostetter Wed, 03 Oct 2012 15:37:10 -0700

: Doc A has keywords "Car Dealer", "Car Repair"
: Doc B has keywords "Car Washing", "Car Clean"
: 
: I have a "Optional Keywords" list that contains keywords like "Dealer".
: 
: If my query is "Car Repair" should only match Doc A.
: If my query is "Car", should match "Car Dealer", because "Dealer" is an
: optional keyword, but if the query is only "Dealer", no documents should be
: matched.


You've provided a few examles of thing you want to see happen -- but with 
odd usecases like this you really have to think hard about what *else* you 
want to see happen, or not happen, in various situations.

For instance: in your example above, it sounds like you would expect "Car 
Clean" to match DocB, correct?  what about just "Clean"? ... if i'm 
understanding you correctly you *don't8 want the word clean to match 
either of those docs.

but what about "car clean" (lowercase) or "Car  Clean  " (extra 
whitespace) what should those match?

I suspect that a way to restart your goals is...
  1) i want to use some basic analysis (eg: standard tokneizer
     or whitespace tokenizer + lowercase filter + stemming + ...)
  2) there are a set of words i want to completley ignore if they 
     appear in a query
  3) except for #1 & #2 i want documents to match only if they 
     have a field value which contains all of the words in the 
     query and no other words.

In which case my suggestion would be:

 a) setup the tokenizer & token filters that you want
 b) add a StopWordFilterFactory to your analyzer chain containing all 
    of your words to ignore.
 c) add a final TokenFilter that concats all of the tokens i nthe stream 
together using a single whitespace dlimiter

"c" is the only thing that Solr doesn't give you out of hte box (i though 
we had something to do that, but i can't find it now) so you'd have to 
write it as a custom plugin.



-Hoss

Re: Tokenizer for Optional Keywords

Reply via email to