Hi James, On 10/23/2008 at 8:30 AM, James liu wrote: > public class AnalyzerTest { > @Test > public void test() throws ParseException { > QueryParser parser = new MultiFieldQueryParser(new String[]{"title", > "body"}, new StandardAnalyzer()); > Query query1 = parser.parse("中文"); > Query query2 = parser.parse("中 文"); > System.out.println(query1); > System.out.println(query2); > } > } > > output : > > title:"中 文" body:"中 文" > (title:中 body:中) (title:文 body:文) > > why they not same?
MultiFieldQueryParser extends QueryParser. QueryParser first breaks up its input on whitespace, and then sends each chunk to the analyzer. query1 contains no whitespace, so the entire string is sent to StandardAnalyzer, which produces one token per CJK character, for a total of two tokens. When QueryParser receives more than one token back from the analyzer, it produces a PhraseQuery (unless tokens share the same position, but this is not applicable to your situation). MultiFieldQueryParser specializes this action by producing one PhraseQuery per Field. query2 contains a space, so the following is performed for each whitespace-separated chunk (in query2, this means one character per chunk): The chunk is sent to the analyzer, which produces a single token for each CJK character - one token is produced because each chunk contains a single character. When QueryParser receives a single token back from the analyzer, it produces a TermQuery. MultiFieldQueryParser specializes this action by producing one TermQuery per Field, and then groups these per-Field TermQuery's together in a BooleanQuery, denoted in the output as a parenthesized group. Steve --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]