Thanks, I think I got it. -----Original Message----- From: John Byrne [mailto:john.by...@propylon.com] Sent: Friday, July 17, 2009 2:43 PM To: java-user@lucene.apache.org Subject: Re: Tokenizer queston: how can I force ? and ! to be separate tokens?
Yes, you could even use the WhitespaceTokenizer and then look for the symbols in a token filter. You would get [you?] as a single token; your job in the token filter is then to store the [?] and return the [you]. The next time the token filter is called for the next token, you return the [?] that you stored previously. If you're already using something that's grammar-based (such as StandardTokenizer) then you could add the "?" to the grammar as a separate token. If you can figure out how to do this from looking at the grammar file, then it's probably the simplest way. -John Matthew Hall wrote: > I'd think extending WhiteSpaceTokenizer would be a good place to start. > > Then create a new Analyzer that exactly mirrors your current Analyzer, > with the exception that it uses your new tokenizer instead of > WhiteSpaceTokenizer (Well.. there is of course my assumption that you > are using an Analyzer that already uses WhiteSpaceTokenizer... but you > likely are) > > OBender wrote: >> Hi All, >> >> >> >> I need to make ? and ! characters to be a separate token e.g. to >> split [how >> are you?] in to 4 tokens [how], [are], [you] and [?] what would be >> the best >> way to do this? >> >> >> >> Thanks >> >> >> > > > ------------------------------------------------------------------------ > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.392 / Virus Database: 270.13.18/2243 - Release Date: 07/17/09 06:08:00 > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org