Thanks, Lance. After exploring for a while, I used lucene's ShingleFilter
followed by the SynonymFilter in Lucene in Action book. Then using the type
attribute, I removed all the shingles which did not belong to any category.
On Wed, Aug 18, 2010 at 10:28 PM, Lance Norskog wrote:
> Yes, you need
Yes, you need an analyzer that leaves successive words together as one
long term. This might be easier to do with the new CharFilter tool,
which processes text before it goes to the tokenizer.
What you are doing here is similar to Parts-Of-Speech analysis, where
text analysis software parses a sen
I think the lucene WhitespaceAnalyzer I am using inside Solr's SynonymFilter
is the one that prevents multi-word synonyms like "New York" from getting
mapped to the generic synonym name like CONCEPTYcity. It appears to me that
an analyzer which recognizes that a white-space is inside a synonym like