On Sat, Aug 20, 2011 at 7:00 PM, Robert Muir <rcm...@gmail.com> wrote: > On Sat, Aug 20, 2011 at 3:34 AM, Trejkaz <trej...@trypticon.org> wrote: > >> >> As an aside, Google's behaviour seems to follow the "old" way. For >> instance, [[ 限定 ]] returns 640,000,000 hits and [[ 限 定 ]] returns >> 772,000,000. (Interestingly, [[ "限定" ]] returns 643,000,000 hits. >> Slightly more than you might expect.) >> > > No it doesn't. query on 北京医科大学 > > You are confusing tokenization with query-generation itself: if you > want 限定 to be treated as a compound then use a tokenizer that does > this.
Nope. I'm not confusing the two, I just haven't seen the source code for Google, so I can't say which level it was doing it at. For my example it seemed pretty opaque. That's a good example, though. TX --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org