Hi, I have a dictionary of multi-word phrases and I'd like to analyze documents such that anything that appears in the dictionary will be treated as one single token. For example, if the dictionary contains "brown fox", then the sentence The quick brown fox jumps over the lazy dog.
Will be tokenized as (with stopwords stripped): quick | brown fox | jumps | lazy | dog What is the best way to achieve this? Thanks, XIyang --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org