Re: Tokenize a dictionary of phrases

govind bhardwaj Sun, 21 Aug 2011 13:24:16 -0700

Hi Xlyang,

You should use KeywordAnalyzer() as it treats the entire string (multi-word
phrase in your case)
as it is without splitting the constituent words.


Thanks,
Govind

On Mon, Aug 22, 2011 at 1:23 AM, Xiyang Chen <settingh...@gmail.com> wrote:

> Hi,
>
> I have a dictionary of multi-word phrases and I'd like to analyze documents
> such that anything that appears in the dictionary will be treated as one
> single token.
> For example, if the dictionary contains "brown fox", then the sentence
> The quick brown fox jumps over the lazy dog.
>
> Will be tokenized as (with stopwords stripped):
> quick | brown fox | jumps | lazy | dog
>
> What is the best way to achieve this?
>
> Thanks,
> XIyang
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
No trees were harmed in the creation of this message, but several thousand
electrons were mildly inconvenienced.

Re: Tokenize a dictionary of phrases

Reply via email to