How to tokenize with comma in standard tokenizer

Bhavin Pandya Mon, 17 Sep 2007 07:51:46 -0700

Hi,

Standard tokenizer works pretty well for me... but i found one problem with my 
usage...


I want to tokenize..."TheRing6,Proposal6,GuyandGirl6" as a three saparate 
tokens.. while standard analyzer considering it as a one word because it has 
one digit in token.

Expected three tokens:
1. thering6
2. proposal6
3. guyandgirl6

i want to change this behaviour of standard tokenizer for this purpose.... But 
i dont know where to change....
Do i need to comment some rule in StandardTokenizer.jj file ???  I am confused 
with this file....

Any pointer...

- Bhavin

How to tokenize with comma in standard tokenizer

Reply via email to