The query string is first parsed by QueryParser and what it believes to be single terms are then passed on to your analyzer. QueryParser only considers space, tab, \n and \r to be white space (See QueryParser.jj)
QueryParser itself is not aware that '-' should be treated as white space so in your second example it treats john--a as a single term. Your analyzer then converts this into two tokens and that is then treated as a phrase query. Try: title:(john- -a) body:(john- -a) I would expect the result to be: (title:john title:a) (body:john body:a) since QueryParser will break on the extra spaces now and your analyzer will strip the remaining '-' afterwards. I guess the best solution is to convert all characters that you consider to be white-space to real spaces before passing the query string to QueryParser. Luc -----Original Message----- From: Dan Armbrust [mailto:[EMAIL PROTECTED] Sent: dinsdag 23 augustus 2005 17:32 To: java-user@lucene.apache.org Subject: WhiteSpace Tokenizer question I wrote a slightly modified version of the WhiteSpaceTokenizer that allows me to treat other characters as whitespace. My thought was that this would be an easy way to make it tokenize on characters such as "-". My tokenizer looks like this: public class CustomWhiteSpaceTokenizer extends CharTokenizer { protected boolean isTokenChar(char c) { if (Character.isWhitespace(c) || whiteSpaceChars_.contains(new Character(c))) { return false; } else { return true; } } <snip other stuff> } When I use my Analyzer which uses this tokenizer in the QueryParser with the character "-" defined as whitespace, the following query gets parsed like this: "title:(john a) body:(john a) " -> (title:john title:a) (body:john body:a) which is what I expect. But then the following query: "title:(john--a) body:(john--a) " -> title:"john a" body:"john a" Isn't what I want. I can't seem to figure out why it is behaving differently on these characters (space vs hyphen) when I am specifying them both as a non-token. This is with the svn trunk as of yesterday. Any help appreciated, Thanks, Dan -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]