Nilesh,

the StandardAnalyzer is full of generally useful special cases, including 
emails and numbers detection.
I am supposing you met one such special case which has a justification of some 
sort.
I can't tell you why but I can tell it's really hard to change because others 
rely on this somehow (I think).

paul


Le 27 mars 2012 à 20:03, Nilesh Vijaywargiay a écrit :

> I have a string 01a_b-_-c-d which is tokenized as
> 01a_b
> c
> d
> 
> and the string a_b-_-c_d which is tokenized as
> a
> b
> c
> d
> 
> why is there a difference when there is a digit at the beginning? I am
> using standard unstemmed tokenizer.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to