Hi Xlyang, You should use KeywordAnalyzer() as it treats the entire string (multi-word phrase in your case) as it is without splitting the constituent words.
Thanks, Govind On Mon, Aug 22, 2011 at 1:23 AM, Xiyang Chen <settingh...@gmail.com> wrote: > Hi, > > I have a dictionary of multi-word phrases and I'd like to analyze documents > such that anything that appears in the dictionary will be treated as one > single token. > For example, if the dictionary contains "brown fox", then the sentence > The quick brown fox jumps over the lazy dog. > > Will be tokenized as (with stopwords stripped): > quick | brown fox | jumps | lazy | dog > > What is the best way to achieve this? > > Thanks, > XIyang > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- No trees were harmed in the creation of this message, but several thousand electrons were mildly inconvenienced.