Re: Strange change to query parser behaviour in recent versions

2011-08-21 Thread Trejkaz
On Sat, Aug 20, 2011 at 7:00 PM, Robert Muir wrote: > On Sat, Aug 20, 2011 at 3:34 AM, Trejkaz wrote: > >> >> As an aside, Google's behaviour seems to follow the "old" way.  For >> instance, [[ 限定 ]] returns 640,000,000 hits and [[ 限 定 ]] returns >> 772,000,000.  (Interestingly, [[ "限定" ]] return

Re: Tokenize a dictionary of phrases

2011-08-21 Thread govind bhardwaj
Hi Xlyang, You should use KeywordAnalyzer() as it treats the entire string (multi-word phrase in your case) as it is without splitting the constituent words. Thanks, Govind On Mon, Aug 22, 2011 at 1:23 AM, Xiyang Chen wrote: > Hi, > > I have a dictionary of multi-word phrases and I'd like to a

Tokenize a dictionary of phrases

2011-08-21 Thread Xiyang Chen
Hi, I have a dictionary of multi-word phrases and I'd like to analyze documents such that anything that appears in the dictionary will be treated as one single token. For example, if the dictionary contains "brown fox", then the sentence The quick brown fox jumps over the lazy dog. Will be tok

Re: Multiple fields derived from same source text?

2011-08-21 Thread Graham Sugden
Closed! TeeSinkTokenFilter and CachingTokenFilter seem to provide the functionality/code examples I was looking for. Thanks, graham. -- Forwarded message -- From: Graham Sugden Date: Thu, Aug 18, 2011 at 5:23 PM Subject: Multiple fields derived from same source text? To: java-use