Yes, you could even use the WhitespaceTokenizer and then look for the
symbols in a token filter. You would get [you?] as a single token; your
job in the token filter is then to store the [?] and return the [you].
The next time the token filter is called for the next token, you return
the [?] that you stored previously.
If you're already using something that's grammar-based (such as
StandardTokenizer) then you could add the "?" to the grammar as a
separate token. If you can figure out how to do this from looking at the
grammar file, then it's probably the simplest way.
-John
Matthew Hall wrote:
I'd think extending WhiteSpaceTokenizer would be a good place to start.
Then create a new Analyzer that exactly mirrors your current Analyzer,
with the exception that it uses your new tokenizer instead of
WhiteSpaceTokenizer (Well.. there is of course my assumption that you
are using an Analyzer that already uses WhiteSpaceTokenizer... but you
likely are)
OBender wrote:
Hi All,
I need to make ? and ! characters to be a separate token e.g. to
split [how
are you?] in to 4 tokens [how], [are], [you] and [?] what would be
the best
way to do this?
Thanks
------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.392 / Virus Database: 270.13.18/2243 - Release Date: 07/17/09 06:08:00
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org