Tokenize a dictionary of phrases

Xiyang Chen Sun, 21 Aug 2011 12:52:04 -0700

Hi,

I have a dictionary of multi-word phrases and I'd like to analyze documents 
such that anything that appears in the dictionary will be treated as one single 
token. 
For example, if the dictionary contains "brown fox", then the sentence
The quick brown fox jumps over the lazy dog.


Will be tokenized as (with stopwords stripped):
quick | brown fox | jumps | lazy | dog

What is the best way to achieve this?

Thanks,
XIyang
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Tokenize a dictionary of phrases

Reply via email to