Thanks, I think I got it.

-----Original Message-----
From: John Byrne [mailto:john.by...@propylon.com] 
Sent: Friday, July 17, 2009 2:43 PM
To: java-user@lucene.apache.org
Subject: Re: Tokenizer queston: how can I force ? and ! to be separate
tokens?

Yes, you could even use the WhitespaceTokenizer and then look for the 
symbols in a token filter. You would get [you?] as a single token; your 
job in the token filter is then to store the [?] and return the [you]. 
The next time the token filter is called for the next token, you return 
the [?] that you stored previously.

If you're already using something that's grammar-based (such as 
StandardTokenizer) then you could add the "?" to the grammar as a 
separate token. If you can figure out how to do this from looking at the 
grammar file, then it's probably the simplest way.

-John

Matthew Hall wrote:
> I'd think extending WhiteSpaceTokenizer would be a good place to start.
>
> Then create a new Analyzer that exactly mirrors your current Analyzer, 
> with the exception that it uses your new tokenizer instead of 
> WhiteSpaceTokenizer (Well.. there is of course my assumption that you 
> are using an Analyzer that already uses WhiteSpaceTokenizer... but you 
> likely are)
>
> OBender wrote:
>> Hi All,
>>
>>  
>>
>> I need to make ? and ! characters to be a separate token e.g. to 
>> split [how
>> are you?] in to 4 tokens [how], [are], [you] and [?] what would be 
>> the best
>> way to do this?
>>
>>  
>>
>> Thanks
>>
>>
>>   
>
>
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 8.5.392 / Virus Database: 270.13.18/2243 - Release Date: 07/17/09
06:08:00
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to